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(54) Sequence-determined DNA fragments and corresponding polypeptides encoded thereby 



(57) The present invention provides DNA molecules 
that constrtute fragments of the genome of a plant, and 
polypeptkles encoded thereby. The DNA molecules are 
useful for specifying a gene product in cells, either as a 
promoter or as a protein coding sequence or as an UTR 
or as a 3' termination sequence, and are also useful in 
controlling the behavior of a gene in the chromosome, 



in controlling the expresskxi of a gene or as tools for 
genetic mapping, recognizing or isolating identical or re- 
lated DNA fragments, or identificatk)n of a particular in- 
dividual organism, or for clustering of a group of organ- 
isms with a comnrx)n trait. 

c^Arabidopsis DNA is used In the present experi- 
ment, but the procedure Is a general one. 
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Description 

RELD OF THE INVENTION 

5 [0001 ] The present invention relates to Isolated polynucleotides that represent a connplete gene, or a fragment there- 
of, that is expressed. In addrtbn, the present invention relates to the polypeptide or protein corresponding to the coding 
sequence of these polynucleotides. The present invention also relates to isolated polynucleotides that represent reg- 
ulatory regions of genes. The present invention also relates to Isolated polynucleotides that represent untranslated 
regions of genes. The present invention further relates to the use of these isolated polynucleotides and polypeptides 

10 and proteins. 

DESCRIPTION OF THE RELATED ART 

[0002] Efforts to nnap and sequence the genome of a number of organisms are in progress; a few complete genome 
IS sequences, for example those of E. oo// and Saccharomyces cerevisiae are known (Blattner et al., Science 277: 1453 
(1997): Goffeau et al.. Sc»nco 274:546 (1996)). The complete genome of a multicellular organism, C. elegans, has 
also been sequenced (See. the C elegans Sequencing Consortium, Science ^'201 2 (1 998)). To date, no complete 
genome of a plant has been sequenced, nor has a complete cDNA complement of any plant been sequenced. 

20 SUMMARY OF THE INVENTION 

[0003] The present invention comprises polynucleotides, such as complete cDNA sequences and/br sequences of 
genomic DNA encompassing complete genes, fragments of genes, and/br regulatory elements of genes and/or regions 
with other functions and/br intergenic regions, hereinafter collectively ref en^ed to as Sequence-Determined DNA Frag- 

2S ments (SDFs). from different plant species, particularly com, wheat, soybean, rice and Ambtdopsis thaliana, and other 
plants and or mutants, variants, fragments or fusions of said SDFs and polypeptides or proteins derived therefrom. In 
some instances, the SDFs span the entirety of a protein-coding segment. In some instances, the entirety of an mRNA 
is represented, aher objects of the invention that are also represented by SDFs of the invention are control sequences, 
such as. but not limited to, promoters. Complements of any sequence of the invention are also considered part of the 

30 invention. 

[0004] Other objects of the invention are polynucleotides comprising exon sequences, polynucleotides comprising 
intron sequences, polynucleotides comprising Introns together with exons. intron/exon junction sequences, 5* untrans- 
lated sequences, and 3' untranslated sequences of the SDFs of the present invention. Polynucleotides representing 
the joinder of any exons described herein, in any arrangement, for example, to produce a sequence encoding any 
35 desirable amino acid sequence are within the scope of the invention. 

[0005] The present invention also resides in probes useful tor isolating and identifying nucleic acids that hybridize 
to an SDF of the invention. The probes can be of any length, but more typically are 12-2000 nucleotides in length; 
more typically, 15 to 200 nucleotides long; even more typically, 18 to 100 nucleotides long. 

[OOOq Yet another object of the invention is a method of Isolating and/or Identifying nucleic acids using the following 
^ stepsi 

(a) contacting a probe of the instant invention with a polynucleotide sample under conditions that permit hybridi- 
zation and formation of a px>lynucleotide duplex; and 

(b) detecting and/or isolating the duplex of step (a). 

45 

[0007] The conditions for hybridization can be from low to moderate to high stringency conditions. The sample can 
include a polynucleotide having a sequence unique in a plant genome. Probes and methods of the invention are useful, 
for example, without limitation, for mapping of genetic traits and/or for positional cloning of a desired fragment of ge- 
nomic DNA 

so [0008] Probes and methods of the invention can also be used for detecting alternatively spliced messages within a 
species. Probes and methods of the Invention can further be used to detect or isolate related genes in other plant 
species using genomic DNA (gDNA) and/or cDNA libraries. In some instances, especially when longer probes and low 
to moderate stringency hybridization conditions are used, the probe will hybridize to a plurality of cDNA and/br gDNA 
sequences of a plant. This approach is useful for isolating representatives of gene families which are identifiable by 

ss possession of a common functional domain In the gene product or which have common cis-acting regulatory sequences. 
This approach is also useful for identifying orthologous genes from other organisms. 

[0009] The present Invention also resides in constructs for modulating the expression of the genes comprised of all 
or a fragment of an SDF. The constmcts comprise all or a fragment of the expressed SDF. or of a complementary 
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sequence. Examples of constructs include rilwzymes comprising RN A encoded by an SDF or by a sequence comple- 
mentary thereto, antisense constructs, constmcts comprising coding regions or parts thereof, constructs comprising 
promoters, Introns, untranslated regions, scaffold attachment regions, methylating regions, enhancing or reducing re- 
gions, DNA and chromatin conformation modifying sequences, etc. Such constructs can be constructed using viral. 

s plasmid, bacterial artificial chronrx)somes (BACs). plasmid artificial chromosomes (PACs), autonomous plant plasmids, 
plant artificial chromosomes or other types of vectors and exist In the plant as autonomous replicating sequences or 
as DNA integrated into the genome. When inserted into a host cell the construct is, preferably, functionally integrated 
with, or operatively linked to, a heterologous polynucleotide. For instance, a coding region from an SDF might be 
operably linked to a promoter that is functional in a plant. 

10 [0010] The present inventkxi also resides in host cells, including bacterial or yeast cells or plant cells, and plants 
that hart)or constmcts such as described above. Another aspect of the invention relates to methods for modulating 
expresskxi of specific genes in plants by expression of the coding sequence of the constructs, by regulatkm of expres- 
sion of one or more endogenous genes in a plant or by suppression of expresston of the polynucleotides of the invention 
in a plant. Methods of modulatbn of gene expression include without limltatk)n (1 ) Inserting into a host cell additional 

IS copies of a polynucleotide comprising a coding sequence; (2) modulating an endogenous pronrK)ter in a host cell; (3) 
inserting antisense or ribozyme constructs into a host cell and (4) inserting into a host cell a polynucleotide comprising 
a sequence encoding a variant, fragment, or fuskxi of the native polypeptkies of the instant invention. 

BRIEF DESCRIPTION OF THE TABLES 

20 

[0011] The sequences of exemplary SDFs and polypeptkJes corresponding to the coding sequences of the instant 
inventbn are described in Reference Tables 1 and 2, REF Tables 1 and 2"; and in Sequence Tables 1 and 2, SEQ 
Tables 1 and 2." The REF Tables refer to a number of Maximum Length Sequences' or MLS." Each MLS corresponds 
to the ksngest cDNA obtained, either by cloning or by the prediction from genomic sequence. The sequence of the 
25 MLS is the cDNA sequence as described in the Av subsectk>n of the REF Tables. 
[0012] The REF Table includes the folbwing information relating to each MLS: 

I. cDNA Sequence 

30 A. 5' UTR 

B. Coding Sequence 

C. S'UTR 

II. Genomb Sequence 

55 

A. Exons 
8. Introns 
C. Promoters 

40 III. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcriptkx) Start Sites 

V. Polypeptide Sequences 

A. Signal Peptkie 
^ B. Domains 

C. Related Polypeptides 

VI. Related Polynucleotkie Sequences 
so I. cDNA SEOUENCE 

[001 3] The REF Tables indicate which sequence in the SEQ Tables represents the sequence of each MLS. The MLS 
sequence can comprise 5' and 3* UTR as well as coding sequences. In addition, specific cDNA clone numbers also 
are included in the REF Tables when the MLS sequence relates to a specific cDNA clone. 

55 

A. 5' UTR 

[0014] The kx»tion of the 5' UTR can be detemiined by comparing the most 5' MLS sequence with the corresponding 
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genomic sequence as indicated in the REF Tables. The sequence that matches, beginning at any of the transcriptional 
start sites and ending at the last nucleotide before any of the translational start sites con-esponds to the 5' UTR 

B. Coding Region 

5 

[0015] The coding region is the sequence in any open reading frame found in the MLS. Coding regions of interest 
are indicated in the Poty P SEQ subsection of the REF Tables. 

C. 3* UTR 

10 

[001 61 The location of the 3' UTR can be determined by comparing the most 3* MLS sequence with the corresponding 
genomic sequence as Indicated in the REF Tables. The sequence that matches, beginning at the translational stop 
site and ending at the last nucleotide of the MLS corresponds to the 3' UTR. 

IS II. GENOMIC SEQUENCE 

[0017] Further, the REF Tables Indicate the specific gi" number of the genomic sequence if the sequence resides in 
a public databank. For each genomic sequence, the REF Tables indicate which regions are included in the MLS. These 
regions can include the 5' and 3* UTRs as well as the coding sequence of the MLS. See, for example, the scheme betow. 

20 



2S 



30 



Region 1 Region 2 Region 3 

I 5' UTR I Exon j — I ExoiT |- — I Exon I 3' UTR 



I " I I 

Promoter i Intron Intron | 

Translational Stop Codon 

Start Site 



[001 8] The REF Tables report the first and last base of each region that are Included in an MLS sequence. An example 
is shown below: 
gi No. 47000: 
3S 37102... 37497 

37593 ... 37925 

The numbers indicate that the MLS contains the following sequences from two regions of gi No. 47000; a first region 
including bases 37102-37497, and a second region including bases 37593-37925. 

^ A, EXON SEQUENCES 

[001S] The location of the exons can be determined by comparing the sequence of the regions from the genomic 
sequences with the con^esponding MLS sequence as indicated by the REF Tables. 

45 1, INITIAL EXON 

[0020] To determine the location of the initial exon. Information from the 

(1) polypeptide sequence section; 
so (2) cDNA polynucleotide section: and 

(3) the genomic sequence section 

of the REF Tables are used. First, the polypeptide section will indicate where the translational start site is located in 
the MLS sequence. The MLS sequence can be matched to the genomic sequence that corresponds to the MLS. Based 
SS on the match between the MLS and corresponding genomic sequences , the location of the translational start site can 
be determined in one of the regions of the genomic sequence. The location of this translational start site is the start of 
the first exon. 

[0021] Generally, the last base of the exon of the corresponding genomic region, in which the translational start site 
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was located, will represent the end of the initial exon. In some cases, the initial exon will end with a stop codon, when 
the initial exon is the only exon. 

[0022] In the case when sequences representing the MLS are In the positive strand of the coaesponding genomic 
sequence, the last base will be a larger number than the first base. When the sequences representing the MLS are in 
s the negative strand of the corresponding genomic sequence, then the last base will be a smaller number than the first 
base. 



H, INTERNAL EXONS 



10 [0023] Except tor the regions that comprise the 5' and 3" UTRs. initial exon. and terminal exon, the remaining genomic 
regions that match the MLS sequence are the internal exons. Specifically, the bases defining the boundaries of the 
remaining regions also define the intron/exon junctions of the intemal exons. 

III. TERMINAL EXON 

IS 

[0024] As with the initial exon. the location of the terminal exon is determined with information from the 



(1 ) polypeptide sequence section; 

(2) cDNA polynucleotide section; and 

(3) the genomic sequence section 



of the REF Tables. The polypeptide section will indicate where the stop codon is located In the MLS sequence. The 
MLS sequence can be matched to the corresponding genomic sequence. Based on the match between MLS and 
corresponding genomic sequences, the location of the stop codon can be determined in one of the regions of the 
genomic sequence. The location of this stop codon is the end of the terminal exon. Generally, the first base of the exon 
of the corresponding genomic region that matches the cDNA sequence, in which the stop codon was located, will 
represent the beginning of the terminal exon. In some cases, the translational start site will represent the start of the 
terminal exon. which will be the only exon. 

[0025] In the case when the MLS sequences are in the positive strand of the corresponding genomic sequence, the 
last base will be a larger number than the first base. When the MLS sequences are in the negative strand of the 
corresponding genomic sequence, then the last base will be a smaller number than the first base. 



B. INTRON SEQUENCES 

[0026] In addition, the introns con'esponding to the MLS are defined by identifying the genomic sequence located 
between the regions where the genomic sequence comprises exons. Thus, introns are defined as starting one base 
downstream of a genomic region comprising an exon. and end one base upstream from a genomic region comprising 
an exon. 



40 C. PROMOTER SEQUENCES 



[0027] As indicated below, promoter sequences corresponding to the MLS are defined as sequences upstream of 
the first exon; more usually, as sequences upstream of the first of multiple transcription start sites; even more usually 
as sequences about 2,000 nucleotides upstream of the first of multiple transcription start sites. 

III. LINK of cDNA SEQUENCES to CLONE IPs 



[0028] As noted above, the REF tables identify the cDNA clone(s) that relate to each MLS. The MLS sequence can 
be longer than the sequences included in the cDNA clones. In such a case, the REF table indicates the region of the 
MLS that is included in the done. If either the 5' or 3* termini of the cDNA clone sequence is the same as the MLS 
sequence, no mention will be made. 



IV. Multiple Transcription Start Sites 

[0029] Initiation of transcription can occur at a number of sites of the gene. The REF tables Indicate the possible 
multiple transcription sites for each gene. In the REF tables, the location of the transcription start sites can be either 
a positive or negative number. The positions indicated by positive numbers refer to the transcription start sites as 
located in the MLS sequence. The negative numbers indicate the transcription start site within the genomic sequence 
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that corresponds to the MLS. 

[0030] To determine the location of the transcription start sites with the negative numt)ers. the MLS sequence is 
aligned with the corresponding genomic sequence. In the instances when a public genomic sequence is referenced, 
the relevant corresponding genomic sequence can be found by direct reference to the nucleotide sequence indicated 
5 by the gl' number shown in the public genomic DN A section of the REF tables. When the position is a negative number, 
the transcription start site is located In the corresponding genomic sequence upstream of the base that matches the 
beginning of the MLS sequence in the alignment. The negative number is relative to the first base of the MLS sequence 
which matches the gerK)mic sequence corresponding to the relevant gi" number. 

[0031] In the instances when no public genomic DNA is referenced, the relevant nucleotide sequence for alignment 
10 is the nucleotide sequence associated with the amino acid sequence designated by gi* number of the later PolyP SEQ 
subsection. 

V. Pohfpeptide Sequences 

IS [0032] The PolyP SEQ subsection lists SEQ ID NOs and Ceres SEQ ID NO for polypeptide sequences corresponding 
to the coding sequence of the MLS sequence and the location of the translational start site with the coding sequence 
of the MLS sequence. 

[0033] The MLS sequence can have multiple translational start sites and can be capable of producing more than 
one polypeptide sequence. 

20 

A. Signal Peptide 

[0034] The REF Tables also indicate In subsection (B) the cleavage site of the putative signal peptide of the polypep- 
tide corresponding to the coding sequence of the MLS sequence. Typically, signal peptide coding sequences comprise 
2S a sequence encoding the first residue of the polypeptide to the cleavage site residue. 

B. Domains 

[0035] Subsection (C) provides information regarding identified domains (where present) within the polypeptide and 
30 (where present) a name for the polypeptide donr^in. 

C. Related Pohfpeptldes 

[0036] Subsection (Dp) provides (where present) Information conceming amino acid sequences that are found to be 
3S related and have some percentage of sequence identity to the polypeptide sequences of REF and SEQ TABLES 1 
AND 2. These related sequences are identified by a gi' number. 

VI. Related Pohrnucleotide Sequences 

40 [0037] Subsection (Dn) provkJes polynucleotide sequences (where present) that are related to and have some per- 
centage of sequence identity to the MLS or corresponding genomic sequence. 



Abbreviation 


Description 


Max Len. Seq. 


Maximum Length Sequence 


relto 


Related to 


Clone Ids 


Clone ID numbers 


Pub gDNA 


Public Genomic DNA 


gi No. 


gi number 


Gen. seq. in cDNA 


Geruxnlc Sequence in cDNA (Each region for a single gene prediction Is 
listed on a separate line. 




In the case of multiple gene predictions, the group of regions relating to a 
single prediction are separated by a blank line) 


(Ac) cDNASEQ 


cDNA sequence 
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(continued) 



Abbreviation 


Description 


- Pat. Appln. SEQ ID NO 


Patent Application SEQ ID NO: 


- Ceres SEQ ID NO: 1673877 


Ceres SEQ ID NO: 


- SEQ # w. TSS 


Location within the cDNA sequence, SEQ ID NO:, ot Transcriptbn Start Sites 
which are listed below 


- Clone ID #:#-># 


Clone ID comprises bases # to # of the cDNA Sequence 


PolyPSEQ 


Polypeptide Sequence 


- Pat. Appln. SEQ ID NO: 


Patent Application SEQ ID NO: 


-Ceres SEQ ID NO 


Ceres SEQ ID NO: 


- Loc. SEQ ID NO: @ nt. 


Location of translatlonal start site in cDNA of SEQ ID NO: at nucleotide 
number 


(C) Fred. PP Nom. & Annol. 


Nomination and Annotation of Domains within Predicted Polypeptide(s) 


- (Title) 


Name of Domain 


- Loc. SEQ ID NO #:#-># aa. 


Locatbn of the domain within the polypeptide of SEQ ID NO: from # to # 
amino acid residues. 


(Dp) Rel. AA SEQ 


Related Amino Acid SfmuAnr^ns 


- Align. NO 


Alignment numk>er 


- gi No 


Gi number 


- Desp. 


Description 


-%ldnt. 


Percent identity 


- Align. Len. 


Alignment Length 


- Loc. SEQ ID NO: # -> # aa 


Location within SEQ ID NO: from # to # amino acid residue. 



DETAILED DESCRIPTIGN OF THE INVENTION 

[0038] The invention relates to (I) polynucleotides and methods of use thereof, such as 



IA. Probes, Primers and Sut)strates; 

IB. Methods of Detection and Isolation; 

B. I. Hybridization; 

8.2. Methods of Mapping; 

8.3. Southern Blotting; 

8.4. Isolating cDNA from Related Organisms; 

8.5. Isolating and/or Identifying Orthologous Genes 

IC. Methods of Inhibiting Gene Expression 

C. 1. Antisense 

C.2. RIbozyme Constructs; 

C.3. Chimeraplasts; 

C.4 Co-Suppression; 

C.5. Transcriptional Silencing 

C.6. Other Methods to inhibit Gene Expression 

ID. Methods of Functional Analysis; 

IE. Promc^er Sequences and Their Use; 
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IF. UTRs and/or Intron Sequences and Their Use; and 

IG. Coding Sequences and Their Use. 

[0039] The invention also relates to (II) polypeptides and proteins and methods of use thereof, such as 

5 

1 1 A. Native Polypeptides and Proteins 

A.1 Antibodies 

A. 2 In Vitro Applications 

10 

II B. Polypeptide Variants, Fragments and Fusions 

B. I \feriants 
B.2 Fragments 

'5 B.3 Fusions 

[0040] The Invention also includes (III) methods of modulating polypeptide production, such as 

IIIA. Suppression 

20 

A.1 Antisense 
A.2 Ribozymes 
A.3 Co-suppression 

A.4 Insertion of Sequences into the Gene to be Modulated 

2S A. 5 Promoter Modulation 

A. 6 Expression of Genes containing Dominant-Negative Mutations 

IIIB. Enhanced Expression 

^ B.I Insertion of an Exogenous Gene 

B. 2 Promoter Modulation 

[0041] The invention further concerns (IV) gene constructs and vector construction, such as 

^ IVA. Coding Sequences 

IVB. Promoters 
IVC. Signal Peptides 

[0042] The invention still further relates to 
^ V Transformation Techniques 

Definitions 

[0043] Allelic variant An allelic variant* Is an alternative form of the same SDF, which resides at the same chro- 
45 mosomal locus in the organism. Allelic variations can occur in any portion of the gene sequence, including regulatory 
regions. Allelic variants can arise by normal genetic variation in a population. Allelic variants can also be produced by 
genetic engineering methods. An allelic variant can be one that is found in a naturally occurring plant. Including a 
cultivar or ecotype. An allelic variant may or may not give rise to a phenotypic change, and may or may not be expressed. 
An allele can result in a detectable change in the phenotype of the trait represented by the locus. A phenotypically 
so silent allele can give rise to a product. 

[0044] Alternatively spliced messages Within the context of the current invention, allematively spliced messag- 
es' refers to mature mRNAs originating from a single gene with variations in the number and/or identity of exons, 
introns and/or intron-exon junctions. 

[0045] Chimeric The term chimeric" is used to describe genes, as defined supra, or contructs wherein at least 
55 two of the elements of the gene or construct, such as the promoter and the coding sequence and/or other regulatory 
sequences and/or filler sequences and/or complements thereof, are heterologous to each other. 
[0046] Constitutive Promoter Promoters referred to herein as 'constitutive promoters" actively promote transcription 
under most, but not necessarily all. environmental conditions and states of devetopment or cell differentiation. Examples 
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of constitutive promoters Include the cauliflower mosaic virus (CaMV) 35S transcript initiation region and the V or 2' 
promoter derived from TDNA of Agrobacterium tumefadens, and other transcrqption initiation regions from various 
plant genes, such as the maize ubiqurlin-1 promoter, known to those of skill. 

[0047] Coordinately Expressed: The term coordinately expressed,' as used in the current Invention, refers to 
genes that are expressed at the same or a similar time and/or stage and/or under the same or similar environmental 
conditk>ns. 

[004q Domain: Domains are fingerprints or signatures that can be used to characterize protein families and/br 
parts of proteins. Such fingerprints or signatures can comprise consented (1 ) primary sequence, (2) secondary struc- 
ture, and/or (3) three-dimensbnal conformatbn. Generally, each domain has been associated with either a family of 
proteins or motifs. Typically, these families and/or motifs have been correlated with specific in-vitro andJor /n-v/vo ac- 
tivities. A domain can be any length, including the entirety of the sequence of a protein. Detailed descriptbns of the 
domains, associated families and motifs, and correlated activities of the polypeptides of the instant Inventton are de- 
scribed below. Usually, the polypeptkies with designated domain(s) can exhibit at least one activity that is exhibited by 
any polypeptide that comprises the same domain(s). 

[0049] Endogenous The term endogenous," within the context of the current invention refers to any polynucle- 
otide, polypeptide or protein sequence which is a natural part of a cell or organisms regenerated from said cell. 
[0050] Exogenous Exogenous," as referred to within, is any polynucleotide, polypeptide or protein sequence, 
whether chimerk; or not, that is initially or sulDsequently introduced into the genome of an individual host cell or the 
organism regenerated from said host cell by any means other than by a sexual cross. Examples of means by whch 
this can be accomplished are described betow, and include Agrobacterium-me6\a\e(i transformation (of dicots - e.g. 
Satomon et al. EMBO J. 3:141 (1984); Herrera-Estrella et al. EMBO J. 2:987 (1983); of monocots, representative 
papers are those by Escudero et al.. Riant J. 10:355 (1996). tshida et al.. Nature Biotechnology AS (1996), l^y 
et al., Bionechnotogy23;.A86 (1 995)). bk)listk: methods (Amrialeo et al. , Cunent Genetics YT.^l 1 990)). electroporation, 
in planta technk^ues, and the like. Such a plant containing the exogenous nuclei ackJ is referred to here as a Tq for 
the primary transgenic plant and T^ for the first generatbn. The temri exogenous' as used herein Is also intended to 
encompass inserting a naturally found element into a non-naturally found kx:ation. 

[0051] Filler sequence: As used herein, filler sequence* refers to any nucleotide sequence that is inserted into 
DNA construct to evoke a partk:ular spacing between partk:ular components such as a promoter and a coding regkxi 
and may provide an additional attribute such as a restriction enzyme site. 

[0052] Gene: The term gene," as used in the context of the current inventbn, encompasses all regulatory and coding 
sequence contiguously associated with a single hereditary unit with a genetic function (see SCHEMATIC 1). Genes 
can include non-coding sequences that modulate the genetic functton that include, but are not limited to, those that 
specify polyadenylatk)n, transcriptional regulation. DNA confonmatfon. chromatin conformation, extent and position of 
base methylatk)n and binding sites of proteins that control all of these. Genes comprised of exons" (coding sequences). 
whk:h may be intermpted by introns" (non-coding sequences), encode proteins. A gene's genetic functbn may require 
only RNA expressk)n or protein productbn. or may only require binding of proteins and/or nuciek: acids without asso- 
ciated expression. In certain cases, genes adjacent to one another may share sequence in such a way that one gene 
will overlap the other. A gene can be found within the genome of an organism, artificial chromosome, plasmkJ. vector, 
etc., or as a separate isolated entity. 

[0053] Gene Family: Gene family* is used in the current inventton to describe a group of f unctk>nally related genes, 
each of whk:h encodes a separate protein. 

[0054] Heterologous sequences: Heterotogous sequences" are those that are not operatively linked or are not 
contiguous to each other in nature. For example, a prorrxster from corn Is consbered heterotogous to an AmMopsis 
coding regton sequence. Also, a promoter from a gene encoding a growth factor from com is consklered heterologous 
to a sequence encoding the com receptor for the growth factor. Regulatory element sequences, such as UTRs or 3' 
end termination sequences that do not originate in nature from the same gene as the coding sequence originates from, 
are considered heterotogous to sato coding sequence. Elements operatively linked in nature and contiguous to each 
other are not heterotogous to each other. On the other hand, these same elements remain operativley linked but become 
heterotogous if other filler sequence is placed between them. Thus, the promoter and coding sequences of a com gene 
expressing an amino acto transporter are not heterologous to each other, but the promoter and coding sequence of a 
corn gene operatively linked in a novel manner are heterotogous. 

[0055] Honoologous gene In the current inventton, honriologous gene" refers to a gene that shares sequence 
similarity with the gene of interest This similarity may be in only a fragment of the sequence and often represents a 
f uncttonal domain such as. examples Including without llmltatton a DNA binding domain, a domain with tyrosine kinase 
activity, or the like. The functional activities of honrotogous genes are not necessarily the same. 
[0056] Inducibte Promoter An inductole promoter" in the context of the current Invention refers to a promoter 
whtth is regulated under certain condittons. such as light, chemtoal concentration, protein concentratton, condittons in 
an organism, cell, or organelle, etc. A typtoal example of an Inducible promoter, which can be utilized with the polynu- 
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cleotides of the present Invention, is PARSK1, the promoter from the Ambidopsis gene encoding a serine-threonine 
kinase enzyme, and which promoter is induced by dehydration, abscissic acid and sodium chloride (V^g and Good- 
man, Plant J. 8:37 (1 995)) Exanples of environmental conditions that may affect transcription by inducible promoters 
include anaerobic conditions, elevated temperature, or the presence of light. 
s [0057] Intergenic region Intergenc region," as used in the current invention, refers to nucleotide sequence oc- 
curring in the genome that separates adjacent genes. 

[0058] Mutant gene In the current invention, mutant' refers to a heritable change in DNA sequence at a specific 
location. Mutants of the current Invention may or may not have an associated Identifiable function when the mutant 
gene Is transcribed. 

10 [0059] Orthologous Gene In the current invention orthologous gene" refers to a secortd gene that encodes a 
gene product that performs a similar function as the product of a first gene. The orthologous gene may also have a 
degree of sequence similarity to the first gene. The orthologous gene may encode a polypeptide that exhibits a degree 
of sequence similarity to a polypeptide corresponding to a first gene. The sequence similarity can be found within a 
functional donnain or along the entire length of the coding sequence of the genes and/or their corresponding polypep- 

is tides. 

[0060] Percentage of sequence, identity "Percentage of sequence identity," as used herein, is determined by 
comparing two optimally aligned sequences over a comparison window, where the fragment of the polynucleotide or 
amino acid sequence in the comparison window may comprise additions or deletions (e.g., gaps or overtiangs) as 
compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two 

20 sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid 
base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number 
of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 
to yield the percentage of sequence Identity. Optimal alignment of sequences for comparison may be conducted by 
the local honr»logy algorithm of Smith and Waterman Add. APL Math. 2:482 (1981), by the homology alignment al- 

2S gorithm of Needleman and Wunsch J. Mot. Biol. 48:443 (1970), by the search for similarity method of Pearson and 
LIpman Pmc. Natl. Acad Sci. (USA)^: 2444 (1988), by computerized implementations of these algorithms (GAP, 
BESTFIT, BLAST. PASTA, and TFASTA in the Wisconsin Genetics Software Package. Genetics Computer Group 
(GCG), 575 Science Dr., Madison, Wl), or by inspectkx). Given that two sequences have been identified for comparison, 
GAP and BESTFIT are preferably employed to determine their optimal alignment. Typically, the default values of 5.00 

30 for gap weight and 0.30 for gap weight length are used. The term "substantial sequence identity" between polynucleotide 
or polypeptide sequences refers to polynucleotide or polypeptide comprising a sequence that has at least 80% se- 
quence identity, preferably at least 85%, nnore preferably at least 90% and most preferably at least 95%, even more 
preferably, at least 96%. 97%, 98% or 99% sequence identity compared to a reference sequence using the programs. 
[0061] Plant Promoter A plant promoter" Is a promoter capable of initiating transcription in plant cells and can 

3S drive or facilitate transcription of a fragment of the SDF of the instant invention or a coding sequence of the SDF of the 
instant invention. Such promoters need not t>e of plant origin. For example, prorTK>ters derived from plant viruses, such 
as the CaMV35S promoter or from Agrobacterium tumefaciens such as the T-DNA pronrnters. can be plant promoters. 
A typfcal example of a plant proiTK>ter of plant origin is the maize ubk?urlin-1 (ubi-l)promoter known to those of skill. 
[0062] Promoter The term "promoter," as used herein, refers to a region of sequence determinants located 

40 upstream from the start of transcription of a gene and which are involved in recognrtwn and binding of RN A polymerase 
and other proteins to initiate and nrxxJulate transcriptkxi. A basal promoter is the minimal sequence necessary for 
assembly of a transcriptbn connplex required for transcription initiation. Basal promoters frequently include a TATA 
box" element usually kxated between 15 and 35 nucleotides upstream from the site of initiatkxi of transcription. Basal 
promoters also sometimes include a CGAAT box" element (typcally a sequence CC AAT) and/or a GGGCG sequence. 

45 usually located between 40 and 200 nucleotides, preferably 60 to 120 nucleotides, upstream from the start site of 
transcriptbn. 

[0063] Public sequence: The term public sequence , " as used in the context of the instant application, refers to 
any sequence that has been deposited in a publicly accessible database. This term encompasses both amino acid 
and nucleotide sequences. Such sequences are publicly accessible, for example, on the BLAST databases on the 
so NCBI FTP web site (accessible at ncbi.nlm.gov/blast). The database at the NCBI GTP site utilizes gi" numbers assigned 
by NCBI as a unique identifier for each sequence in the databases, thereby providing a non-redundant database for 
sequence from various databases, including GenBank. EMBL, DBBJ. (DNA Database of Japan) and PDB (Brookhaven 
Protein Data Bank). 

[0064] Regulatory Sequence The term regulatory sequence," as used in the current Inventbn, refers to any 
ss nucleotkJe sequence that influences transcription or translation Initiation and rate, and stability and/or mobility of the 
transcript or poIypeptkJe product. Regulatory sequences include, but are not limited to, promoters, promoter control 
elements, protein binding sequences, 5* and 3* UTRs, transcriptkxial start site, terminatkxi sequence, polyadenylatbn 
sequence, introns. certain sequences within a coding sequence, etc. 
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[0065] Related Sequences: Related sequences' refer to either a polypeptide or a nucleotide sequence that 
exhibits some degree of sequence similarity with a sequence described by the REF and SEQ tables. 
[0066] Scaffold Attachment Region (SAR) As used herein, scaffold attachment region" is a DNA sequence that 
anchors chromatin to the nuclear matrix or scaffold to generate loop domains that can have either a transcriptionally 
s active or inactive structure (Spiker and Thompson (1996) Plant Physiol. 110: 15-21). 

[0067] Sequence-determined DNA fragments (SDFs) Sequence-determined DNA fragments* as used in the 
current invention are isolated sequences of genes, fragments of genes, intergenic regions or contiguous DNA from 
plant genomic DNA or cDNA or RNA the sequence of which has been determined. 

[0068] Signal Peptide A signal peptide" as used in the current invention is an amino acid sequence that targets 
10 the protein for secretion, for transport to an Intracellular compartment or organelle or for incorporation into a membrane. 
Signal peptides are indicated in the tables and a more detailed description located below 

[0069] Specific Promoter In the context of the current invention, specific promoters' refers to a subset of induc- 
ible promoters that have a high preference for being induced in a specific tissue or cell and/or at a specific time during 
development of an organism. By high preference' is meant at least 3-fold, preferably 5-fold, nK>re preferably at least 

IS 10-fold still more preferably at least 20-fold. 50-fold or 100-fold increase In transcription in the desired tissue over the 
transcription in any other tissue. Typical examples of temporal and/or tissue specific promoters of plant origin that can 
be used with the polynucleotides of the present Invention, are: PTA29, a promoter which is capable of driving gene 
transcription specifically In tapetum and only during anther development (Koltonow et al., Plant C&// 2:1201 (1990); 
RCc2 and RCc3, promoters that direct root-specific gene transcription in rice (Xu et al., Plant Mot. Biol. 27:237 (1995); 

20 TobRB27. a root-specific promoter from tobacco (Yamamotoetal., Plant Cell ^.37^ (1991)), Examples of tissuespecific 
promoters under developmental control include promoters that initiate transcription only in certain tissues or organs, 
such as root, ovule, fruit, seeds, or flowers. Other suitable pronrK>ters include those from genes encoding storage 
proteins or the lipid body membrane protein, oleosin. A few root-specific prooKsters are noted above. 
[0070] Stringency 'Stringency" as used herein is a function of probe length, probe composition (G + C content), 

25 and salt concentration, organic solvent concentration, and temperature of hybridization or wash conditions. Stringency 
is typically compared by the parameter T^, which is the temperature at which 50% of the complementary molecules 
in the hybridization are hybridized, in terms of a temperature differential from T„. High stringency conditions are those 
providing a condition of T„ - 5"C to T„ - 10°C, Medium or moderate stringency conditions are those providing T„ - 
20*C to T„ - 29*0. Low stringency conditions are those providing a condition of T„ - 40"C to T^ - 48'»C. The relationship 

30 of hybridization conditions to T„ (in "C) is expressed in the mathematical equation 

T^ = 81.5 -16.6(log^o[fto*]) + 0.41 (%G4C) - (600^N) (1) 

35 where N is the length of the probe. This equation works well for probes 1 4 to 70 nucleotides In length that are identical 
to the target sequence. The equation below for T„ of DNA-DN A hybrids is useful for probes in the range of 50 to greater 
than 500 nucleotides, and for conditions that include an organic solvent (formamide). 

40 T„ = 81.54-16.6 log {[Na^]/(l40.7[Na*])}+ 0.41 (%G+C)-500/L 0.63(%fomriamide) (2) 



where L is the length off the probe in the hybrid. (P Tijessen, Hybridization with Nucleic Acid Probes' in Laboratory 
Techniques in Biochemistn^ and Molecular BioloQV. PC. vand der Vliet, ed., c. 1 993 by Elsevier, Amsterdam.) The T^ 
of equation (2) is affected by the nature of the hybrid; for DNA-RNA hybrids T^ Is 10-15"C higher than calculated, for 
RNA-RNA hybrids T^ is 20-25'»C higher. Because the T^ decreases about 1 '^C for each 1% decrease In homology 
when a long probe is used (Bonner et al.,J. Mol. Biol. 81:123 (1973)). stringency conditions can be adjusted to favor 
detection of identical genes or related family members. 

[0071] Equation (2) is derived assuming equilibrium and therefore, hybridizations according to the present inventbn 
are most preferably performed under conditions of probe excess and for sufficient time to achieve equilibrium. The 
time required to reach equilibrium can be shortened by inclusion of a hybridization accelerator such as dextran sulfate 
or another high volume polymer in the hybridization buffer. 

[0072] Stringency can be controlled during the hybridization reaction or after hybridization has occurred by altering 
the salt and temperature conditions of the wash solutions used. The formulas shown above are equally valid when 
used to compute the stringency off a wash solution. Preferred wash solution stringencies lie within the ranges stated 
above; high stringency is 5-8»C below T^ medium or moderate stringency is 26-29'C below T- and low stringency is 
45-48»C below T^. 

[0073] Substantially free of A composition containing A is substantially free of B when at least 85% by weight 
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of the total A+B in the composition is A. Preferably, A comprises at least about 90% by weight of the total of A-i-B in 
the composition, more preferably at least about 95% or even 99% by weight. For example, a plant gene or a DNA 
sequence can be considered substantially free of other plant genes or DNA sequences. 

[0074] Translatlonal start site In the context of the current invention, a translatlonal start site" is usually an ATG 
5 in the cDNA transcript, more usually the first ATG. A single cDNA, however, may have multiple translational start sites. 
[0075] Transcription start site Transcription start site' is used in the current invention to describe the point at 
which transcription is Initiated. This point is typically located about 25 nucleotides downstream from a TFIID binding 
site, such as a TATA box. Transcription can initiate at one or more sites within the gene, and a single gene may have 
multiple transcriptional start sites, some of which may be specific for transcription in a particular cell-type or tissue. 
10 [0076] Untranslated region (UTR) A UTR' is any contiguous series of nucleotide bases that is transcribed, but 
is not translated. These untranslated regions may be associated with particular functions such as increasing mRNA 
message stability. Exannples of UTRs include, but are not limited to polyadenylatlon signals, terminations sequences, 
sequences located between the transcriptional start site and the first exon (5' UTR) and sequences located between 
the last exon and the end of the mRNA (3* UTR). 
IS [0077] VSariant: The term variant' is used herein to denote a polypeptide or protein or polynucleotide molecule 
that differs from others of its kind in some way. For example, polypeptide and protein variants can consist of changes 
in amino acid sequence and/or charge and/br post-translational nrxxiiflcations (such as glycosylatlon, etc). 

DETAILED DESCRIPTION OF THE INVENTION 

20 

I. Polynucleotides 

[0078] Exemplified SDFs of the invention represent fragments of the genome of corn, wheat, rice, soybean or A/a- 
b/dopsfe and/or represent mRNA expressed from that genome. The isolated nucleic acid of the invention also encom- 
2S passes con'esponding fragments of the genome and/or cDNA complenrtent of other organisms as described in detail 
below 

[0079] Polynucleotides of the invention can be isolated from polynucleotide libraries using prinners comprising se- 
quence similar to those described by the REF and SEQ Tables. See, for example, the methods described in Sambrook 
et al.. supra. 

30 [0080] Alternatively, the polynucleotkJes of the invention can be produced by chemk:al synthesis. Such synthesis 
methods are described bebw. 

[0081] It is contemplated that the nucleotkie sequences presented herein nnay contain some small percentage of 
errors. These errors may arise in the normal course of determinatbn of nucleotide sequences. Sequence errors can 
be corrected by obtaining seeds deposited under the accession numbers cited above, propagating them, isolating 
35 genomic DNA or appropriate mRNA from the resulting plants or seeds thereof, amplifying the relevant fragment of the 
genomic DNA or mRNA using primers having a sequence that flanks the erroneous sequence, and sequencing the 
amplifk:ation product. 

I. A Probes. Primers and Substrates 

40 

[0082] SDFs of the invention can be applied to substrates for use in array applications such as, but not limited to, 
assays of gtobal gene expresskwi, for example under varying conditions of development, growth condittons. The arrays 
can also be used in diagnostic or forensk: methods (WO95/35505, US 5,445.943 and US 5,410,270). 
[0083] Probes and primers of the instant inventk)n will hybridize to a polynucleotide comprising a sequence in REF 

^ and SEQ TABLES 1 AND 2. Though many different nucleotkie sequences can encode an amino ackJ sequence, the 
sequences of REF and SEQ TABLES 1 AND 2 are generally preferred for encoding polypeptides of the invention. 
However, the sequence of the probes and/or primers of the instant inventfon need not be identical to those in REF and 
SEQ TABLES 1 AND 2 or the complements thereof. For example, some variation in probe or primer sequence andbr 
length can allow additbnal family members to be detected, as well as orthologous genes and more taxonomically 

so distant related sequences. Similarty, probes and/or primers of the inventbn can include additional nucleotides that 
serve as a label for detecting the formed duplex or for subsequent ckxiing purposes. 

[0084] Probe length will vary depending on the appWcafton. For use as primers, probes are 12-40 nucleolides. pref- 
erably 18-30 nucleotides long. For use in mapping, probes are preferably 50 to 500 nucleotides, preferably 100-250 
nucleotkies kNig. For Southem hybrkiizatkyis, probes as long as several kibbases can be used as explained below. 
ss [0085] The probes and/or primers can be produced by synthetic procedures such as the triester method of Matteucci 
et al. J. Am. Chem. Soc, 103:31 85( 1981); or according to Urdea et al. Proc. Natl. Acad. 80:7461 (1981) or using 
commercially available automated oligonucleotkle synthesizers. 
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LB. Methods of Detection and Isotetion 

[0086] The polynucleotides of the Invention can be utilized In a number of methods known to those skilled in the art 
as probes and/or primers to isolate and detect polynucleotides, including, without nmitatk)n: Southerns, Northerns, 
5 Branched DNA hybridizatkxi assays, polymerase chain reactk)n, and microarray assays, and variations thereof. Spe- 
cific methods given by way of examples, and discussed bebw include: 

Hybrldizatnn 
Methods of Mapping 
10 Southern Bkming 

Isolating cDNA from Related Organisms 
Isolating and/or Identifying Orthologous Genes. 

Also, the nucleic ackJ molecules of the invention can used in other methods, such as high density oligonucleotide 
IS hybridizing assays, described, for example, in U.S. Pat. Nos. 6,004.753; 5,945.306; 5,945,287; 5.945,308; 5.91 9,686; 
5,919,661; 5,919.627; 5,874,248; 5,871,973; 5,871,971; and 5,871,930; and PCT Pub. Nos. WO 9946380; WO 
9933981; WO 9933870; WO 9931252; WO 9915658; WO 9906572; WO 9858052; WO 9958672; and WO 9810858. 

B.I. Hybridization 

20 

[0087] The isolated SDFs of REF and SEQ TABLES 1 AND 2 of the present invention can be used as probes and/ 
or primers for detectkxt and/or isolatksn of related polynucleotide sequences through hybridizatkxi. Hybridizatkyi of 
one nuciek: ackJ to another constitutes a physical property that defines the subject SDF of the invention and the identified 
related sequences. Also, such hybridizaton imposes structural limitations on the pair A good general discussion of 
the factors for determining hybridization conditions is provkled by Sambrook et al. ("Molecular Ctoning, a Laboratory 
Manual. 2nd ed„ c. 1989 by Cold Spring Harbor Laboratory Press Cold Spring Harbor, NY; see esp., chapters 11 and 
12). Additional conskleratkxis and details of the phys^al chemistry of hybridizatkm are provided by G.H. Keller and 
M.M. Manak DNA Probes', 2n<* Ed. pp. 1-25, c. 1993 by Stockton Press, New York, NY 

[0088] Depending on the stringency of the conditions under which these probes and/or primers are used, polynucle- 
30 otides exhibiting a wkJe range of similarity to those in REF and SEQ TABLES 1 AND 2 can be detected or isolated. 
When the practitioner wishes to examine the result of membrane hybridizatbns under a variety of stringencies, an 
effrcient way to do so is to perform the hybridization under a low stringency condition, then to wash the hybridizatk)n 
membrane under increasingly stringent conditions. 

[0089] When using SDFs to identify orthologous genes in other species, the practitioner will preferably adjust the 
35 amount of target DNA of each species so that, as nearty as is practical, the same number of genome equivalents are 
present for each species examined. This prevents faint signals from species having large genomes, and thus small 
numbers of genome equivalents per mass of DNA. from erroneously being interpreted as absence of the corresponding 
gene in the genome. 

[0090] The probes andfor primers of the instant inventk)n can also be used to detect or isolate nucleotkies that are 
40 identrcar to the probes or primers. Two nucleic ackJ sequences or polypeptkJes are said to be 'kientical' if the sequence 
of nucleotkJes or amino ackJ reskJues, respectively, in the two sequences is the same when aligned for maximum 
correspondence as described betow. 

[0091] Isolated polynucleotides within the scope of the inventkm also include allete variants of the specific sequences 
presented in REF and SEQ TABLES 1 AND 2. The probes and/or primers of the inventbn can also be used to detect 
45 and/or isolate polynucleotides exhibiting at least 80% sequence identity with the sequences of REF and SEQ TABLES 
1 AND 2 or fragments thereof. 

[0092] With respect to nucleotide sequences, degeneracy of the genetk: code provkJes the possibility to substitute 
at least one base of the base sequence of a gene with a different base without causing the amino acid sequence of 
the polypeptide produced from the gene to be changed. Hence, the DNA of the present inventbn may also have any 
so base sequence that has been changed from a sequence in REF and SEQ TABLES 1 AND 2 by substitution in accord- 
ance with degeneracy of genetc code. References describing codon usage include: Carols et aL, J, Mol. Evol 46: 45 
(1998) and Fennoy ef a/., Mudl Aads fles. 21(23) : 5294 (1993). 

B.2. Mapping 

55 

[0093] The isolated SDF DNA of the inventkxi can be used to create various types of genetk: and physical maps of 
the genome of com, Arabidopsis. soyt>ean, rice, wheat, or other plants. Some SDFs may be absolutely associated 
with partcular phenotypk: traits, alk)wing constructkxi of gross genetk: maps. While not all SDFs will immediately be 
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associated with a phenotype, atl SDFs can be used as probes for identifying polymoiphisms associated v«th phenotypes 
of interest. Briefly, one method of mapping Involves total DNA isolation from individuals. It is subsequently cleaved with 
one or more restriction enzymes, separated according to mass, transferred to a solid support, hybridized with SDF 
DNA and the pattern of fragments compared. Polymorphisms associated with a particular SDF are visualized as dif- 
ferences in the size of fragments produced between individual DNA samples after digestion with a particular restriction 
enzyme and hybridization with the SDR After identification of polymorphic SDF sequences, linkage studies can be 
conducted. By using the individuals showing polymorphisms as parents in crossing programs, F2 progeny recombinants 
or recombinant inbreds, for example, are then analyzed. The order of DNA polymorphisms along the chromosomes 
can be determined based on the frequency with which they are inherited together versus independently The closer 
two polymorphisms are together in a chromosome the higher the probability that they are inherited together. Integration 
of the relative positions of all the polymorphisms and associated marker SDFs can produce a genetic map of the 
species, where the distances between maricers reflect the recombination frequencies In that chromosome segment. 
[0094] The use of recombinant inbred lines for such genetic mapping is described for Arabidopsis by Alonso-Blanco 
etaL (Methods in Molecular Biok)gy,volQ2, Arabkk>psis Protocols', pp. 137-146, J.M. Martinez-Zapaterand J. Salinas, 
eds., c. 1998 by Humana Press, Totowa, NJ) and for com by Burr ( Mapping Genes with Recombinant Inbreds', pp. 
249-254. In Freeling, IWI. and V Walbot (Ed.). The Maae Handbook c 1994 by Springer-Veriag New York, Inc.: New 
York, NY, USA; Beriin Germany; Burr et al. Genetics (1998) 118: 519; Gardiner. J. et al.. (1993) Genetics 134: 917). 
This procedure, however, is not limited to plants and can be used for <^her organisms (such as yeast) or for indivkiual 
cells. 

[0095] The SDFs of the present invention can also be used for simple sequence repeat (SSR) mapping. Rice SSR 
mapping is described by Morgante et al. {The Plant Journal 3: 165), Panaud et al. {Genome (1995) 38: 1170); 
Senkjr et al. (Crop Science (1 996) 36: 1676). Taramino et al. (Genome (1 996) 39: 277) and Ahn et al. (Molecular and 
General Genetics (1993) 241: 483-90). SSR mapping can be achieved using various methods. In one instance, poly- 
morphisms are identified when sequence specify probes contained within an SDF flanking an SSR are made and used 
in polymerase chain reaction (PGR) assays with template DNA from two or more individuals of interest. Here, a change 
in the number of tandem repeats between the SSR-flanking sequences produces differently sized fragments (U.S. 
Patent 5.766,847). Alternatively, polymorphisms can be kJentifled by using the PGR fragment produced from the SSR- 
flanking sequence specify primer reactk>n as a probe against Southern blots representing different individuals (U.H. 
Refseth et al., (1997) Electrophoresis 18: 1519). 

[0096] Genetic and physical maps of crop species have many uses. For example, these maps can be used to devise 
positbnal cloning strategies for isolating novel genes from the mapped crop species. In addition, because the genomes 
of ctosely related species are largely syntenic (that Is, they display the same ordering of genes within the genome), 
these maps can be used to isolate novel alleles from relatives of crop species by positional cloning strategies. 
[0097] TTie vartous types of maps discussed above can be used with the SDFs of the inventton to kJentify Quantitative 
Trait Loci (QTl_s). Many important crop traits, such as the solids content of tomatoes, are quantitative traits and result 
from the combined interactions of several genes. These genes reside at dffferent k>c\ in the genome, oftentimes on 
different chromosomes, and generally exhibit multiple alleles at each kx:us. The SDFs of the inventwn can be used to 
kJentify QTLs and isolate specific alleles as described by de Vicente and Tanksley (Genetics 134:585 (1 993)). In additton 
to isolating QTL alleles in present crop species, the SDFs of the invention can also be used to isolate alleles from the 
corresponding QTL of wild relatives. Transgenb plants having vark>us combinations of QTL alleles can then be created 
and the effects of the combinations measured. Once a desired allele combinatk>n has been kJentif ied. crop improvement 
can be accomplished either through bkrtechnotogical means or by directed conventfonal breeding programs (for review 
see Tanksley and McCouch. Sdenca 277: 1063 (1997)). 

[0098] In another embodiment, the SDFs can be used to help create physical maps of the genome of corn, Arabi- 
dopsis and related species. Where SDFs have been ordered on a genetic map, as described above, they can be used 
as probes to discover which clones in large libraries of plant DNA fragments in YAGs, BAGs. etc. contain the same 
SDF or similar sequences, thereby facilitating the assignment of the large DNA fragments to chromosomal positk>ns. 
Subsequently, the large BAGs. YAGs. etc. can be ordered unambiguously by more detailed studies of their sequence 
composition (e.g. Marra et al. (1997) Genome Research 7:1072-1084) and by using their end or other sequences to 
find the klentical sequences in other ctoned DNA fragments. The overlapping of DNA sequences in this way allows 
large contigs of plant sequences to be built that, when suffk:rently extended, provkie a complete physical map of a 
chromosome. Sometimes the SDFs themselves will provkle the means of joining cloned sequences into a contig. 
[0099] The patent publKatkm WO95/35505 and U.S. Patents 5,445.943 and 5,410,270 describe scanning multiple 
alleles of a plurality of kx:l using hybridizatfon to arrays of oligonucleotides. These technfc|ues are useful for each of 
the types of mapping discussed above. 

[0100] Foltowing the procedures described above and using a plurality of the SDFs of the present invention, any 
individual can be genotyped. These indivkiual genotypes can be used for the identification of particular cultivars. va- 
rieties, lines, ecotypes and genetk^Ily modified plants or can sen^e as tools for subsequent genetic studies involving 
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multiple phenotypic traits. 

B.3 Southern Blot Hybridization 

s [0101] The sequences Uom REF and SEQ TABLES 1 AND 2 can be used as probes for vartous hybridization tech- 
niques. These techniques are useful for detecting target polynucleotides in a sample or for determining whether trans- 
genic plants, seeds or host cells harbor a gene or sequence of interest and thus might be expected to exhibit a particular 
trait or phenotype. 

[01 02] In addition, the SDFs from the invention can be used to isolate additional members of gene families from the 
10 same or different species and/or orthologous genes from the same or different species. This is accomplished by hy- 
bridizing an SDF to, for example, a Southem blot containing the appropriate genomic DNA or cDN A. Given the resulting 
hybridization data, one of ordinary skill in the art could distinguish and isolate the correct DNA fragments by size, 
restriction sites, sequence and stated hybridization conditions from a gel or from a library. 

[0103] Identification and isolation of orthologous genes from closely related species and alleles within a species is 
IS particularly desirable because of their potential for crop improvement. Many important crop traits, such as the solid 
content of tomatoes, result from the combined interactions of the products of several genes residing at different loci in 
the genome. Generally, alleles at each of these loci can make quantitative differences to the trait By identifying and 
isolating numerous alleles for each kxus from within or different species, transgenic plants with various combinatkjns 
of alleles can be created and the effects of the combinations measured. Once a more favorable allele combination has 
20 been identified, crop Improvement can be accomplished either through bbtechnologlcal means or by directed conven- 
tional breeding programs (Tanksley et al. Science 277: 1 063(1 997^ V 

[0104] The results from hybridizatons of the SDFs of the inventwn to, for example. Southem bk>ts containing DNA 
from another species can also be used to generate restrictbn fragment maps for the corresponding genomic regions. 
These maps provide additional informatkxi about the relative positions of restriction sites within fragments, further 

2S distinguishing mapped DNA from the remainder of the genome. 

[0105] Physical maps can be made by digesting genomk: DNA with different combinattons off restrfction enzymes. 
[0106] Probes for Southem bbtting to distinguish indivkiual restriction fragments can range in size from 15 to 20 
nucleotides to several thousand nucleotkies. More preferably the probe is 1 00 to 1 ,000 nucleotkies bng for identifying 
members of a gene family when it is found that repetitive sequences would complicate the hybridization. For identifying 

30 an entire corresponding gene in another species, the probe is more preferably the length of the gene, typically 2,000 
to 10,000 nucleotides, but probes 50-1,000 nucleotides tong might be used. Some genes, however, might require 
probes up to 1 ,500 nucleotktes k)ng or overlapping probes constituting the full-length sequence to span their lengths. 
[0107] Also, while it is preferred that the probe be homogeneous with respect to its sequence, it is not necessary 
For example, as described below, a probe representing members off a gene family having diverse sequences can be 

3S generated using PGR to amplify genome DNA or RNA templates using primers derived from SDFs that include se- 
quences that define the gene family- 

[0108] For identifying corresponding genes in another species, the next most preferable probe is a cDNA spanning 
the entire coding sequence, whk:h albws all erf the mRNA-coding fragment of the gene to be kJentified. Probes for 
Southern btolting can easily be generated from SDFs by making primers having the sequence at the ends of the SDF 

40 and using com or Arabidopsis genome DNA as a template. In instances where the SDF includes sequence consented 
among species, primers including the consented sequence can be used for PGR with genome DNA from a species of 
interest to obtain a probe. Similarly, if the SDF Includes a domain of interest, that fragment of the SDF can be used to 
make primers and, with appropriate template DNA, used to make a probe to kientify genes containing the donnain. 
Alternatively, the PGR products can be resolved, for example by gel electrophoresis, and cloned and^ sequenced. 

45 Using Southem hybrkiizatkx), the variants of the domain among members of a gene family, both within and across 
species, can be examined. 

B.4.1 Isolating DNA from Related Organisms 

so [01 09] The SDFs of the inventwn can be used to isolate the corresponding DNA from other organisms. Either cDN A 
or genome DNA can be isolated. For isolating genomic DNA, a lambda, cosmid, BAG or YAG, or other large insert 
genomic library from the plant of interest can be constructed using standard molecular bk>logy technques as described 
in detail by Sambrook et al. 1989 (Molecular Gtoning: A Laboratory Manual, 2^ ed. GoW Spring Harbor Laboratory 
Press, New Yoric) and by Ausubel et al. 1992 (Gurrent Protocols in Molecular Blofogy Greene Publishing, New York). 

ss [0110] To screen a phage library, for example, recombinant lambda clones are plated out on appropriate bacterial 
medium using an appropriate E. co// host strain. The resulting plaques are lifted from the plates using nylon or nitro- 
cellulose filters. The plaque lifts are processed through denaturation, neutralization, and washing treatments folbwing 
the standard protocols outlined by Ausubel et al. (1992). The plaque lifts are hybridized to either radioactively labeled 
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or non-radioactively labeled SDF DNA at room temperature for about 16 hours, usually in the presence of 50% forma- 
mlde and 5X SSC (sodium chloride and sodium citrate) buffer and blocking reagents. The plaque lifts are then washed 
at 42*C with 1% Sodium Dodecyl Sulfate (SDS) and at a particular concentration of SSC. The SSC concentration used 
is dependent upon the stringency at which hybridization occurred in the Initial Southern blot analysis performed. For 

s example, if a fragment hybridized under medium stringency (e.g., Tm - 20'*C), then this condition is rraintained or 
preferably adjusted to a less stringent condition (e.g.. Tm-30"C) to wash the plaque lifts. Positive clones show detect- 
able hybridization e.g., by exposure to X-ray films or chromogen formation. The positive clones are then subsequently 
Isolated for purification using the same general protocol outlined above. Once the clone is purified, restriction analysis 
can be conducted to narrow the region corresponding to the gene of interest. The restriction analysis and succeeding 

10 subcloning steps can be done using procedures described by, for example Sambrook et al. (ig89) cited above. 

[0111] The procedures outlined for the lambda library are essentially similar to those used for YAC library screening, 
except that the YAC clones are harbored in bacterial cobnies. The YAC clones are plated out at reasonable density 
on nitrocellulose or nylon filters supported by appropriate bacterial medium in petri plates. Folfowing the growth of the 
bacterial ck)nes, the fillers are processed through the denaturatbn, neutralization, and washing steps following the 

IS procedures of Ausubel et al. 19g2. The same hybridizatbn procedures for lambda library screening are followed. 
[0112] To isolate cDNA, similar procedures using appropriately nnodlfied vectors are emptoyed. For instance, the 
library can be constmcted in a lambda vector appropriate for ctoning cDNA such as Xglll. Alternatively, the cDNA 
library can be made In a plasmid vector. cDNA for ctoning can be prepared by any of the methods known in the art, 
but is preferably prepared as described above. Preferably, a cDNA library will include a high proportion of full-length 

20 clones. 

B. 5. Isolating and/or Identifvina Orthotooous Genes 

[0113] Probes and primers of the Invention can be used to Identify and/or isolate polynucleotkJes related to those in 

25 REF and SEQ TABLES 1 AND 2. Related polynucleotides are those that are native to other plant organisms and exhibit 
either similar sequence or encode polypeptides with similar biological activity One specific example is an orthotogous 
gene. Orthotogous genes have the same f uncttonal activity. As such, orthotogous genes may be distinguished from 
homotogous genes. The percentage of identity Is a function of evolutionary separation and, in ctosely related species, 
the percentage of Identity can be 98 to 100%. The amino acid sequence of a protein encoded by an orthologous gene 

30 can be less than 75% ldentk:al, but tends to be at Ieast75% or at least 80% identical, more preferably at least 90%, 
most preferably at least 95% kientbal to the amino acid sequence of the reference protein. To find orthotogous genes, 
the probes are hybrkiized to nucleic acids from a species of interest under low stringency conditions, preferably one 
where sequences containing as much as 40-45% mismatches will be able to hybridize. This condilbn is established 
by Tm - 40"C to T„ - 48'C (see betow). Blots are then washed under conditions of increasing stringency. It is preferable 

35 that the wash stringency be such that sequences that are 85 to 1 00% kJentlcal will hybridize. More preferably, sequences 
90 to 100% identk^al will hybridize and most preferably only sequences greater than 95% klentical will hybridize. One 
of ordinary skill in the art will recognize that, due to degeneracy in the genetic code, amino acki sequences that are 
kjenttoal can be encoded by DNA sequences as little as 67% toentical or less. Thus, it is preferable, for example, to 
nriake an cverlappang series of shorter probes, on the order of 24 to 45 nucleotides, and indivkJually hybridize them to 

40 the same arrayed library to avokJ the problem of degeneracy Introducing large numbers of mismatches. 

[0114] As evoluttonary divergence increases, genome sequences also tend to diverge, "mus, one of skill will recog- 
nize that searches for orthotogous genes between more divergent species will require the use of tower stringency 
condittons compared to searches between closely related species. Also, degeneracy of the genetto code is nr»re of a 
problem for searches in the genome of a species more distant evoluttonarily from the species that is the source of the 

^ SDF probe sequerK^es. 

[0115] The SDFs of the inventton can also be used as probes to search for genes that are related to the SDF within 
a species. Such related genes are typk:alty consklered to be members of a gene family In such a case, the sequence 
similarity will often be cortcentrated into one or a few fragments of the sequence. The fragments of similar sequence 
that define the gene family typically encode a fragment of a protein or RN A that has an enzynratk: or structural function. 

so The percentage of klentity in the amino acid sequence of the domain that defines the gene family is preferably at least 
70%. more preferably 80 to 95%, most preferably 85 to 99%. To search for members of a gene family within a species, 
a low stringency hibridizatton is usually performed, but this will depend upon the size, distribution and degree of se- 
quence divergence of domains that define the gene family. SDFs encompassing regulatory regtoris can be used to 
toentify coordinately expressed genes by using the regulatory region sequence of the SDF as a probe. 

ss [0116] In the instances where the SDFs are Identified as being expressed from genes that confer a particular phe- 
notype, then the SDFs can also be used as probes to assay plants of different species for those phenotypes. 
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I.e. Methods to Inhibit Gene Expression 

[0117] The nucleic acid molecules of the present invention can be used to inhibit gene transcription and/or translation. 
Example of such methods include, wittK)ut limitation: 

5 

Antisense Constructs; 
Ribozyme Constructs; 
Chimeraplast Constructs; 
Co-Suppression; 
10 Transcriptional Silencing; and 

Other Methods of Gene Expression. 

C.I Antisense 

IS [011 8] In some instances it is desirable to suppress expression of an endogenous or exogenous gene. A well-known 
instance is the FLAVOR-SAVOR" tomato, in which the gene encoding ACC synthase is inactivated by an antisense 
approach, thus delaying softening of the fruit after ripening. See for example. U.S. Patent No. 5,859.330; U.S. Patent 
No, 5,723.766; Oeller, et al, Science, 254:437-439(1 991 ); and Hamilton et al, Nature, 346:284-287 (1 990). Also, timing 
of flowering can be controlled by suppression of the FLOWERING LOCUS C (FLC); high levels of this transcript are 

20 associated with late flowering, while absence of PLC is associated with early flowering (S. D. Michaels et al. , Plant Cell 
J1,:949 (1999). Also, the transition of apical meristem from production of leaves with associated shoots to flowering Is 
regulated by TERMINAL FLOWER1, ARETALA 1 and LEAFY. Thus, when it is desired to induce a transition from shoot 
production to flowering, it Is desirable to suppress TFL1 expression (S.J. Liljegren, Plant Cell ^^:^007 (1999)). As 
another instance, arrested ovule development and female sterility result from suppression of the ethylene forming 

2S enzyme but can be reversed by application of ethylene (D. De Martinis et al., Plant Cell^^:^oe^ (1999)). The ability 
to manipulate female fertility of plants is useful in increasing fruit production and creating hybrids. 
[0119] In the case of polynucleotides used to inhibit expression of an endogenous gene, the Introduced sequence 
need not be perfectly identical to a sequence of the target endogenous gene. The introduced polynucleotide sequence 
will typically be at least substantially identical to the target endogenous sequence. 

30 [0120] Some polynucleotide SDFs in REF and SEQ TABLES 1 AND 2 represent sequences that are expressed in 
com.wheat. rice, soybean Arabkiopsis and/or other plants. Thus the invention includes using these sequences to gen- 
erate antisense constructs to Inhibit translation and/6r degradation of transcripts of said SDFs, typically in a plant cell. 
[0121] To accomplish this, a polynucleotide segment from the desired gene that can hybridize to the mRNA expressed 
from the desired gene (the antisense segment') is operably linked to a promoter such that the antisense strand of RNA 

35 will be transcribed when the construct is present in a host cell. A regulated promoter can be used in the constmct to 
control transcription of the antisense segment so that transcription occurs only under desired circumstances. 
[0122] The antisense segment to be introduced generally will be substantially Identical to at least a fragment of the 
endogenous gene or genes to be repressed. The sequence, however, need not be perfectly Identical to inhibit expres- 
sion. Further, the antisense product may hybridize to the untranslated region instead of or In addition to the coding 

40 sequence of the gene. The vectors of the present invention can be designed such that the inhibitory effect applies to 
other proteins within a family of genes exhibiting honnology or substantial homology to the target gene. 
[0123] For antisense suppression, the Introduced antisense segment sequence also need x\o\ be full length relative 
to either the primary transcription product or the fully processed mRNA. Generally, a higher percentage of sequence 
identity can be used to compensate for the use of a shorter sequence. Furthermore, the introduced sequence need 

45 not have the same intron or exon pattern, and homology of noncoding segments nr«y be equally effective. Normally, 
a sequence of between about 30 or 40 nucleotides and the full length of the transcript can be used, though a sequence 
of at least about 100 nucleotides is preferred, a sequence of at least about 200 nucleotides Is more preferred, and a 
sequence of at least about 500 nucleotides is especially preferred. 

so C.2. Ribozvmes 

[0124] It is also contemplated that gene constructs representing ribozymes and based on the SDFs in REF AND 
SEQ TABLES 1 AND 2 are an object of the invention. Ribozymes can also be used to inhibit expression of genes by 
suppressing the translation of the mRNA into a polypeptide. It is possible to design ribozymes that specifically pair with 
ss virtually any target RNA and cleave the phosphodiester backbone at a specific location, thereby functionally Inactivating 
the target RNA. In carrying out this cleavage, the ribozyme is not itself altered, and is thus capable of recycling and 
cleaving other molecules, making it a true enzyme. The incluskxi of ribozyme sequences within antisense RNAs confers 
RNAcleavIng activity upon them, thereby Increasing the activity of the constructs. 
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[0125] A number of classes of ribozymes have been identified. One class of ribozymes is derived from a number of 
small circular RN As, which are capable of selfcleavage and replication in plants. The RN As replicate either abne (viroid 
RNAs) or with a helper virus (satellite RNAs). Examples include RNAs from avocado sunbtatch vrrold and the satellite 
RNAs from tobacco ringspot virus, luceme transient streak virus, velvet tobacco mottle virus, solanum nodiflorum mottle 
5 virus and subterranean clover mottle virus. The design and use of target RNAspecrfic ribozymes is described in Haseloff 
et al. Nature, 334:585 (1988). 

[0126] Like the antisense constnjcts above, the rftx>zyme sequence fragment necessary for pairing need not be 
identical to the target nucleotides to be cleaved, nor identical to the sequences in REF AND SEQ TABLES 1 AND 2. 
Ribozymes may be constructed by combining the ribozyme sequence and some fragment of the target gene which 

10 wouW altow recognition of the target gene mRNA by the resulting rtoozyme molecule. Generally, the sequence in the 
ribozyme capable of binding to the target sequence exhibits a percentage of sequence identity with at least 80%. 
preferably with at least 85%, more preferably with at least 90% and most preferably with at least 95%, even more 
preferably, with at least 96%, 97%, 98% or 99% sequence identity to some fragment of a sequence In REF AND SEQ 
TABLES 1 AND 2 or the complement thereof. The ribozyme can be equally effective in inhibiting mRNA translatbn by 

IS cleaving either in the untranslated or coding regions. Generally, a higher percentage off sequence identity can be used 
to compensate for the use of a shorter sequence. Furthennore, the introduced sequence need not have the same 
intron or exon pattern, and homology of non-coding segments may be equally effective. 

0.3. Chimeraplasts 

20 

[01 27] The SDFs of the inventk)n, such as those described by the REF and SEQ Tables, can also be used to construct 
chimeraplasts that can be introduced Into a cell to produce at least one specifk: nucleotkle change in a sequence 
corresponding to the SDF of the invention. A chimeraplast is an oligonucleotide comprising DNA andfor RNA that 
speclfk^ally hybrkJizes to a target region in a manner whk:h creates a mismatched base-pair. Ttiis mismatched base- 
2S pair signals the cell's repair enzyme machinery whch acts on the mismatched region resulting In the replacement, 
insertton or deletion of designated nucleotkie(s). The altered sequence is then expressed by the cell's normal cellular 
mechanisms. Chimeraplasts can be designed to repair mutant genes, nnodify genes, introduce site-specifk: mutations, 
and/or act to inten'upt or alter normal gene functkm (US Pat. Nos. 6,010,907 and 6,004,804; and PCT Pub. No. 
W099/58723 and VVO99/07865). 

30 

C.4. Sense Suppressk)n 

[0128] The SDFs of REF and SEQ TABLES 1 AND 2 of the present inventkwi are also useful to modulate gene 
expresskxi by sense suppresston. Sense suppresson represents another method of gene suppression by Introducing 

35 at least one exogenous copy or fragment of the endogenous sequence to be suppressed. 

[0129] Introduction of expressbn cassettes in which a nucleic acid is configured in the sense orientation with respect 
to the pronnoter into the chromosome of a plant or by a self-replk:ating virus has been shown to be an effective means 
by which to induce degradatkxi of mRNAs of target genes. For an example of the use of this method to modulate 
expresskxi of endogenous genes see, Napoli et al.. The Plant Ceil 2:279 (1990), and U.S. Patents Nos, 5.034,323, 

40 5,231,020, and 5,283,184. Inhibiton of expresswn may require some transcription of the Introduced sequence. 

[0130] For ser^ suppresskxi, the introduced sequence generally will be substantially kJentk^al to the endogenous 
sequence intended to be inactivated. The minimal percentage of sequence kJentity will typk^ally be greater than about 
65%, but a higher percentage of sequence identity might exert a more effective reduction in the level of nomnal gene 
products. Sequence klentity of more than about 80% is prefen'ed, though about 95% to absolute kientity woukl be 

45 most prefen-ed. As with antisense regulatk)n, the effect would likely apply to any other proteins within a similar family 
of genes exhibiting homobgy or substantial honriotogy to the suppressing sequence. 

C.5. Transcriptbnal Silencing 

so [0131] The nudek; acid sequences of the invention, including the SDFs of REF and SEQ TABLES 1 AND 2, and 
fragments thereof, contain sequences that can be inserted into the genome of an organism resulting in transcriptbnal 
silencing. Such regulatory sequences need not be operativety linked to coding sequences to modulate transcription of 
a gene. Specifically, a pronrnter sequence without any other element of a gene can be introduced into a genome to 
transcriptk)nally silence an endogenous gene (see, for example, Vfeucheret, H et al. (1998) The Plant Journal 16: 

ss 651-659). As ancrther example, triple helces can be formed using oligonucleotides based on sequences from REF 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The oligonucleotide can 
be delivered to the host cell and can bind to the promoter in the genome to form a triple helix and prevent transcriptkxi. 
An oligonucleotide of interest is one that can bind to the pronrx^er and bkx:k binding of a transcriptbn factor to the 
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promoter. In such a case, the oligonucleotide can be complementaiy to the sequences of the promc^er that interact 
with transcription binding factors. 

C.6. Other Methods to Inhibit Gene Expression 

5 

[0132] Yet another means of suppressing gene expression is to insert a polynucleotide into the gene of interest to 
disrupt transcription or translation of the gene. 

[01 33] Low frequency homologous recombination can be used to target a polynucleotide insert to a gene by flanking 
the polynucleotide insert with sequences that are substantially similar to the gene to be disrupted. Sequences from 
10 REF AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto can be used for 
homologous recombination. 

[01 34] In addition, random insertion of polynucleotides into a host cell genome can also be used to disrupt the gene 
of interest Azpiroz-Leehan et al., Trends in Genetics 2^.1 52 (1 997). In this method, screening for clones from a library 
containing random insertions is preferred to identifying those that have polynucleotides inserted into the gene of interest. 
IS Such screening can be performed using probes and/or primers described above based on sequences from REF AND 
SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The screening can also be 
performed by selecting clones or Rf plants having a desired phenotype. 

I.D. Methods of Functional Analysis 

20 

[0135] The constructs described in the methods under LC. above can be used to determine the function of the 
polypeptide encoded by the gene that is targeted by the constructs. 

[0136] Down-regulating the transcription and translation of the targeted gene in the host cell or organisms, such as 
a plant, may produce phenotypic changes as compared to a wild-type cell or organism. In addition, in vitro assays can 
2S be used to determine if any biological activity, such as calcium flux, DNA transcription, nucleotide incorporation, etc., 
are being modulated by the down-regulation of the targeted gene. 

[0137] Coordinated regulation of sets of genes, e.g., those contributing to a desired polygenic trait, is sometimes 
necessary to obtain a desired phenotype. SDFs of the invention representing transcription activation and DNA binding 
domains can be assembled into hybrid transcriptional activators. These hybrid transcriptional activators can be used 
30 with their corresponding DNA elements (i.e., those bound by the DNA-binding SDFs) to effect coordinated expression 
of desired genes (J.J. Schwarz et al., Moi. Cell Biol 12:266 (1992), A. Martinez et al., Mol. Gen. Genet 261:546 
(1999)). 

[01 3S] The SDFs of the invention can also be used in the two-hybrid genetic systems to identify networks of protein- 
protein interactions (L McAlister-Henn et al., Methods^^,^ (1 999). J.C. Hu et al. , Mef/Tods 20:80 (2000), M. Golovkin 
3S et a/., J. Biol Chem. 274:36428 (1999), K. Ichimura et al.. Biochem. Biophys. Res. Comm. 253:532 (1998)). The SDFs 
of the invention can also be used in various expression display methods to identify important protein-DNA interactions 
(e.g. B. Luo et al, J. Moi. Biol. 266:479 (1997)). 

I.E. Promoters 

40 

[01 39] The SDFs of the invention are also useful as structural or regulatory sequences in a construct for modulating 
the expression of the con'esponding gene in a plant or other organism, e.g. a symbiotic bacterium. For example, pro- 
moter sequences associated to SDFs of REF and SEQ TABLES 1 AND 2 of the present invention can be useful in 
directing expression of coding sequences either as constitutive promoters or to direct expression in particular cell types, 

45 tissues, or organs or in response to environmental stimuli. 

[0140] Wrth respect to the SDFs erf the present invention a promoter is likely to be a relatively small portion of a 
genomk: DNA (gDNA) sequence located in the first 2000 nucleotides upstream from an initial exon kJentified in a gDNA 
sequence or initial ATG' or methbnine codon or translational start site in a corresponding cDNA sequence. Such 
promoters are more likely to be found in the first 1000 nucleotdes upstream of an initial ATG or methk)nine codon or 

so translatbnal start site of a cDNA sequence corresponding to a gDNA sequence. In particular, the pronnoter is usually 
located upstream of the transcriptwn start site. The fragments of a partk^ular gDNA sequence that f unctbn as elements 
of a promoter in a plant cell will preferably be found to hybrklize to gDNA sequences presented and described in REF 
and REF AND SEQ TABLES 1 AND 2 at medium or high stringency, relevant to the length of the probe and its base 
composition. 

ss [0141] Promoters are generally modular in nature. Promoters can consist of a basal promoter that f unctbns as a site 
for assembly of a transcription complex comprising an RNA polymerase, for example RNA polymerase II. A typical 
transcriptkxi complex will include additbnal factors such as TF||B, TFyD, and TF||E. Of these, TF||D appears to be the 
only one to bind DNA directly. The pronrioter might also contain one or more enhancers and/or suppressors that function 
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as binding sites for additional transcription factors that have the function of modulating the level of transcription with 
respect to tissue specificity and of transcriptional responses to particular environmental or nutritional factors, and the 
like. 

[0142] Short DNA sequences representing binding sites for proteins can be separated from each other by intervening 
5 sequences of varying length. For example, within a particular functional module, protein binding sites may be constituted 
by regions of 5 to 60. preferably 10 to 30, more preferably 10 to 20 nucleotides. Within such binding sites, there are 
typically 2 to 6 nucleotides that specifically contact amino acids of the nucleic acid binding protein. The protein binding 
sites are usually separated from each other by 10 to several hundred nucleotides, typically by 15 to 150 nucleotides, 
often by 20 to 50 nucleotides. DNA binding sites in promoter elements often display dyad synnmetry in their sequence. 
10 Often elements binding several different proteins, and/or a plurality of sites that bind the same protein, will be combined 
in a region of 50 to 1 ,000 basepairs. 

[0143] Elements that have transcription regulatory function can be isolated from their corresponding endogenous 
gene, or the desired sequence can be synthesized, and recombined In constructs to direct expression of a coding 
region of a gene in a desired tissue-specific, temporal-specific or other desired rmnner of inducibility or suppression. 

IS When hybridizations are performed to Identify or isolate elements of a promoter by hybridization to the long sequences 
presented in REF AND SEQ TABLES 1 AND 2, conditions are adjusted to account for the above<Jescribed nature of 
promoters. For exannple short probes, constituting the element sought, are preferably used under low temperature 
and/or high salt conditions. When long probes, which might include several pronrK>ter elements are used, low to medium 
stringency conditions are preferred when hybridizing to promoters across species. 

20 [0144] If a nucleotide sequence of an SDF, or part of the SDF, functions as a promoter or fragment of a promoter, 
then nucleotide substitutions, insertions or deletions that do not substantially affect the binding of relevant DNA binding 
proteins would be considered equivalent to the exemplified nucleotide sequence. It is envisioned that there are in- 
stances where it is desirable to decrease the binding of relevant DNA binding proteins to silence or down-regulate a 
promoter, or conversely to increase the binding of relevant DNA binding proteins to enhance or up-regulate a pronrwter 

2S and vice versa. In such Instances, polynucleotides representing changes to the nucleotide sequence of the DNA-protein 
contact region by insertion of additional nucleotides, changes to identity of relevant nucleotides, including use of chem- 
icalty-modified bases, or deletion of one or more nucleotides are considered encompassed by the present Invention. 
In addition, fragments of the promoter sequences described by the REF and SEQ Tables and variants thereof can be 
fused with other pronrK>ters or fragments to facilitate transcription and/or transcription In specific type of cells or under 

30 specific conditions. 

[0145] Promoter function can be assayed by methods known in the art, preferably by measuring activity of a reporter 
gene operatively linked to the sequence being tested for promoter function. Examples of reporter genes include those 
encoding luciferase, green fluorescent protein, GUS, neo, cat and bar. 

35 LR UTRs and Juncttons 

[01 46] Polyn ucleotides comprising untranslated (UTR) sequences and intron/exon juncttons are also within the scope 
of the invention. UTR sequences include introns and 5' or 3' untranslated regkxis (5* UTRs or 3* UTRs). Fragments of 
the sequences shown In REF AND SEQ TABl-ES 1 AND 2 can comprise UTRs and intron/exon junctions. 
40 [0147] These fragments of SDFs, especially UTRs, can have regulatory functions related to, for example, translation 
rate and mRNA stability. Thus, these fragments of SDFs can be isolated for use as elements of gene constructs for 
regulated productnn of polynucleotides encoding desired polypeptkles. 

[0148] Introns of genomic DNA segments might also have regulatory functions. Sometimes regulatory elements, 
especially transcription enhancer or suppressor elements, are found within Introns. Also, elements related to stability 
^ of heteronudear RNA and efficiency of splicing and of transport to the cytoplasm for translation can be found in intron 
elements. Thus, these segments can also find use as elements of expression vectors intended for use to transform 
plants. 

[0149] Just as with promoters UTR sequences and intron/exon junctions can vary from those shown in REF AND 
SEQ TABLES 1 AND 2. Such changes from those sequences preferably will not affect the regulatory activity of the 
so UTRs or intron/exon junction sequences on expression, transcription, or translation unless selected to do so. However, 
in some instances, down- or up-regulation of such activity may be desired to nrxxiulate traits or phenotypic or in vitro 
activity. 

I.G. Coding SeQuerw»s 

ss 

[0150] Isolated polynucleotides of the invention can include coding sequences that encode polypeptides comprising 
an amino acid sequence encoded by sequences in REF AND SEQ TABLES 1 AND 2 or an amino acid sequence 
presented in REF AND SEQ TABLES 1 AND 2. 
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[0151] A nucleotide sequence encodes a polypeptide if a cell (or a cell free in vitro system) expressing that nucleotide 
sequence produces a polypeptide having the recited amino acid sequence when the nucleotide sequence is transcribed 
and the primary transcript is suk)sequentty processed and translated by a host cell (or a cell free in vitro system) har- 
boring the nucleic acid. Thus, an isolated nucleic acid that encodes a particular amino acid sequence can be a genomic 
5 sequence comprising exons and Introns or a cDN A sequence that represents the product of splicing thereof. An isolated 
nucleic acid encoding an amino acid sequence also encompasses heteronuclear RNA, which contains sequences that 
are spliced out during expression, and mRNA, which lacks those sequences. 

[01 52] Coding sequences can be constructed using chemical synthesis techniques or by isolating coding sequences 
or by modifying such synthesized or isolated coding sequences as described above. 
10 [0153] In addition to coding sequences encoding the polypeptide sequences of REF AND SEQ TABLES 1 AND 2, 
which are native to com. Arabidopsis, soybean, rice, wheat, and other plants the isolated polynucleotides can be 
polynucleotides that encode variants, fragments, and fusions off those native proteins. Such polypeptides are described 
below in part II. 

[0154] In variant polynucleotides generally, the number of substitutions, deletions or Insertions is preferably less than 
IS 20%, more preferably less than 1 5%; even more preferably less than 1 0%, 5%. 3% or 1 % of the number of nucleotides 
comprising a particularly exemplified sequence. It is generally expected that non-degenerate nucleotide sequence 
changes that result in 1 to 10. more preferably 1 to 5 and most preferably 1 to 3 amino acid insertions, deletions or 
substitutions will not greatly affect the function of an encoded polypeptide. The most preferred embodiments are those 
wherein 1 to 20, preferably 1 to 10. most preferably 1 to 5 nucleotides are added to, deleted from and/or substituted 
20 in the sequences specifically disclosed in REF AND SEQ TABLES 1 AND 2. 

[0155] Insertions or deletions in polynucleotides intended to be used for encoding a polypeptide preferably preserve 
the reading frame. This consideration is not so important in instances when the polynucleotide is intended to be used 
as a hybridization probe. 

2S II. Polypeptides and Proteins 

HA. Native polypeptides and proteins 

[0156] Polypeptides within the scope of the invention include both native proteins as well as variants, fragments, 
30 and fusions thereof. Polypeptides of the invention are those encoded by any of the six reading frames of sequences 
shown in REF AND SEQ TABLES 1 AND 2, preferably encoded by the three frames reading in the 5' to 3' direction of 
the sequences as shown. 

[01 571 Native polypeptides include the proteins encoded by the sequences shown in REF AND SEQ TABLES 1 AND 
2. Such native polypeptides include those encoded by allelic variants. 

3S [0158] Polypeptide and protein variants will exhibit at least 75% sequence identity to those native polypeptides of 
REF AND SEQ TABLES 1 AND 2. More preferably, the polypeptide variants will exhibit at least 85% sequence identity; 
even more preferably, at least 90% sequence identity; more preferably at least 95%. 96%, 97%, 98%, or 99% sequence 
identity. Fragments of polypeptide or fragments of polypeptides will exhibit similar percentages of sequence identity to 
the relevant fragments of the native polypeptide. Fusions will exhibit a similar percentage of sequence identity In that 

40 fragment of the fusion represented by the variant of the native peptide. 

[0159] Furthermore, polypeptide variants will exhibit at least one off the functional properties of the native protein. 
Such properties include, without limitation, protein interaction, DNA interaction, biological activity, immunological ac- 
tivity, receptor binding, signal transduction, transcription activity, growth factor activity, secondary structure, three-di- 
mensional structure, etc. As to properties related to in vitro or in vivo activities, the variants preferably exhibit at least 

45 60% of the activity of the native protein; more preferably at least 70%. even more preferably at least 80%, 85%. 90% 
or 95% of at least one activity of the native protein. 

[0160] One type of variant off native polypeptides comprises amino acid substitutions, deletions andtor insertions. 
Consen/ative substitutions are preferred to maintain the function or activity of the polypeptide. 
[0161] Within the scope of percentage of sequence identity described above, a polypeptide of the invention may 
so have additional individual amino acids or amino acid sequences Inserted into the polypeptide In the middle thereof 
and/or at the N-terminal and/or C-termlnal ends thereof. Likewise, some of the amino acids or amino acid sequences 
may be deleted from the polypeptide. 

A.1 Antibodies 

55 

[0162] Isolated polypeptides can be utilized to produce antibodies. Polypeptides of the invention can generally be 
used, for example, as antigens for raising antibodies by known technkfues. The resulting antibodies are useful as 
reagents for determining the distribution off the antigen protein within the tissues of a plant or within a cell of a plant. 
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The antibodies are also useful for examining the production level of proteins in various tissues, for example in a wild- 
type plant or following genetic manipulation of a plant, by methods such as Western blotting. 
[0163] Antibodies of the present Invention, both polyclonal and monoclonal, may be prepared by conventional meth- 
ods. In general, the polypeptides of the invention are first used to immunize a suitable animal, such as a mouse, rat, 

s rabbit, or goat. Rabbits and goats are preferred for the preparation of polyclonal sera due to the volume of serum 
obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies as detection reagents. Immunization is 
generally performed by mixing or emulsifying the protein in saline, preferably in an adjuvant such as Freund's complete 
adjuvant, and injecting the mixture or emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 
50-200 ^g/injection is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more injections 

10 of the protein in saline, preferably using Freund's incomplete adjuvant. One may alternatively generate antibodies by 
in vitro immunization using methods known in the art, whbh for the purposes of this invention is considered equivalent 
to in wo Immunlzatbn. 

[0164] Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 
the bk)od at 25'C for one hour, folbwed by incubating the bkxxd at 4**C for 2-18 hours. The serum is recovered by 

IS centrifugatton (e.g.. 1 .OOOxg for 10 minutes). About 20-50 ml per bleed may be obtained from rabbits. 

[0165] Monock)nal antibodies are prepared using the method of Kohler and Milstein, t^tum 256: 4g5 (1975). or 
nrKxlifkatkxi thereof. Typkally. a mouse or rat is Immunized as described above. However, rather than bleeding the 
animal to extract serum, the spleen (and optbnally several large lymph nodes) Is removed and dissociated into single 
cells. If desired, the spleen cells can be screened (after removal of nonspecifically adherent cells) by applying a cell 

20 suspension to a plate, or well, coated with the protein antigen. B-cells producing membrane-bound immunogtobulin 
specific for the antigen bind to the plate, and are not rinsed away with the rest of the suspension. Resulting B<3ells, or 
all dissociated spleen cells, are then Induced to fuse with myetoma cells to fonm hybrkiomas, and are cultured in a 
selective medium (e.g., hypoxanthine. amirK)pterin, thymidine medium, HAT"). The resulting hybridomas are plated by 
limiting dilutkxi, and are assayed for the productk)n of antibodies whch bind specifically to the immunizing antigen 

2S (and whKh do rKA bind to unrelated antigens). The selected Mab^ecretlng hybridomas are then cultured either in vitro 
{e.g., in tissue culture bottles or holk>w fiber reactors), or in vivo (as ascites in mice). 

[0166] Other methods for sustaining antibody-producing B-cell ck>nes. such as by EBV transformatkxi, are known. 
[0167] If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional techniques. 
Suitable labels include fluorophores, chromophores, radfoactive atoms (particularly 32p and i25|) electron-dense re- 
30 agents, enzymes, and ligands having specific binding partners. Enzymes are typk:ally detected by their activity For 
example, horseradish peroxidase is usually detected by its ability to convert 3,3',5.5Metramethylbenzidine (TNB) to a 
blue pigment, quantifiable with a spectrophotometer. 

A.2 In Vitro Applications of Polypeptides 

[0168] Some polypeptides of the inventkxi will have enzymatk: activities that are useful in vitro. For example, the 
soybean trypsin inhibitor (Kunitz) family is one of the nunnerous families of proteinase inhibitors. It comprises plant 
proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, thiol protein- 
ases and aspartb proteinases. Thus, these peptides find in vitro use in protein purification protocols and perhaps in 
therapeutb settings requiring topical application of protease inhibitors. 

[0169] Delta-amlnolevulinic acid dehydratase (EC 4.2.1.24) (ALAD) catalyzes the second step in the bbsynthesis 
of heme, the condcnsatbn of two molecules of 5-aminolevulinate to iorm porphobilinogen and is also involved in chk>- 
rophyll bk)synthesis(Kaczor et al. (1994) Plant Physiol. 1-4: 1411-7; Smith (1988) Bkx:hem. J. 249: 42M; SchnekJer 
(1 976) Z. naturforsch. {C] 31 : 55-63). Thus. ALAD proteins can be used as catalysts in synthesis of heme derivatives. 
Enzymes of biosynthetic pathways generally can be used as catalysts for in vitro synthesis of the compounds repre- 
senting products of the pathway. 

[01 70] Polypeptides encoded by SDFs of the invention can be engineered to provkJe purification reagents to kientify 
and purify additk)nal polypeptktes that bind to them. This albws one to identify proteins that function as multimers or 
eluckiate signal transduction or metabolk: pathways. In the case of DNA binding proteins, the polypeptide can be used 
in a similar manner to kientify the DNA detemninants of specific binding (S. Pierrou et al. , Anal. Biochem. 229:99 (1 995), 
S. Chusacultanachai et al., J. Biol. Chem. 274:23591 (1999), Q. Lin et al., J. Biol. Chem. 272:27274 (1997)). 

ll.B . POLYPEPTIDE VARIANTS . FRAGMENTS. AND FUSIONS 

[0171] Generally, variants , fragments, or fusk>ns of the polypeptkles encoded by the maximum length sequence 
(MLS) can exhibit at least one of the activities of the kientified donnains and/or related polypeptides described in Sections 
(C) and (D) of REF TABLES 1 and 2 corresponding to the MLS of interest. 
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li.B.(1) Variants 

[0172] A type of variant of the native polypeptides comprises amino acid substitutions. Consen^tive substitutions, 
described above (see 11), are preferred to maintain the function or activity of the polypeptide. Such substitutions include 
conservation of charge, polarity, hydrophobicity, size, etc. For example, one or more amino acid residues within the 
sequence can be substituted with another amino acid of similar polarity that acts as a functional equivalent, for example 
providing a hydrogen borKJ in an enzymatic catalysis. Substitutes for an amino acid within an exemplified sequence 
are preferably made among the members of the class to which the amino acid belongs. For example, the nonpolar 
(hydrophobic) amino acids include alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan and methio- 
nine. The polar neutral amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, and glutamine. 
The positively charged (basic) amino acids include arginine. lysine and histidine. The negatively charged (acidic) amino 
acids include aspartic acid and glutamic acid. 

[0173] Within the scope of percentage of sequence identity described above, a polypeptide of the invention may 
have additional Individual amino adds or amino acid sequences inserted into the polypeptide in the middle thereof 
and/or at the N-temninal and/or C-terminal ends thereof. Likewise, some of the amino acids or amino acid sequences 
may be deleted from the polypeptide. Amino acid substitutions may also be made in the sequences; conservative 
substitutions being preferred. 

[0174] One preferred class of variants are those that comprise (1 ) the ctomain of an encoded polypeptide andtor (2) 
residues consented between the encoded polypeptide and related polypeptides. For this class of variants, the encoded 
polypeptide sequence Is changed by Insertion, deletion, or substitution at positions flanking the domain and/or con- 
served residues. 

[0175] Another class of variants includes those that comprise an encoded polypeptide sequence that is changed in 
the domain or conserved residues by a conservative substitutk>n. 

[017q Yet another class of variants Includes those that lack one of the in vitro activities, or structural features of the 
encoded polypeptides. One example is polypeptides or proteins produced from genes comprising dominant negative 
mutations. Such a variant may comprise an encoded polypeptkJe sequence with non-consen/ative changes in a par- 
tknjlar domain or group of conserved reskiues. 

ILA.(2) FRAGMENTS 

[0177] Fragments of particular interest are those that comprise a domain klentified for a polypeptkJe encoded by an 
MLS of the instant invention and variants thereof. Also, fragments that comprise at least one region of reskJues con- 
sewed between an MLS encoded polypeptide and its related polypeptides are of great interest. Fragments are some- 
times useful as polypeptkies corresponding to genes comprising dominant negative mutations are. 

II.A.(3)FUSIONS 

[0178] Of interest are chimeras comprising (1) a fragment of the MLS encoded polypeptide or variants thereof of 
interest and (2) a fragment of a polypeptide comprising the same domain. For example, an AP2 helix encoded by a 
MLS of the invention fused to second AP2 helix from ANT protein, which comprises two AP2 helices. The present 
inventkxi also encompasses f usk)ns of MLS encoded polypeptides, variants, or fragments thereof fused with related 
proteins or fragments thereof. 

DEFINITION OF DOMAINS 

[01 79] The polypeptkies of the invention may possess kJentifying domains as shown in REF TABLES 1 and 2Specific 
domains within the MLS encoded polypeptides are indk:ated by the reference REF TABLES 1 and 2. In additk)n. the 
domains within the MLS encoded polypeptkie can be defined by the regbn that exhibits at least 70% sequence kJentity 
with the consensus sequences listed in the detailed description bebw of each of the domains. 
[0180] The majority of the protein donnain descrjptkxis given bek>w are obtained from Prosite, 
(httpZ/www.expasy.ch/prosite/), and Ram, 
(http//pfam.wustLedu/browse.shtml). 

1 . (AAA) AAA-protein family signature 

[0181] A large family of ATPases has been described [1 to 5] whose key feature is that they share a conserved region 
of about 220 amino ackJs that contains anATP-binding site. This family is now called AAA, for 'ATPases 'A'ssociated 
with diverse cellular Activities. The proteins that bekmg to this family either contain one or two AAA domains. Proteins 
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containing two AAA ckxnains: 

- Manrvnalian and drosophila NSF (N-ethylmalelmlde-sensitive fusion protein) and the fungal homolog, SEC 18. 
These proteins are involved in intracellular transport between the endoplasmic reticulum and Golgi. as well as 
between different Giolgi cisternae. 

- Mammalian transitional endoplasmic reticulum ATPase (previously known as p97 or VCP) which is involved in the 
transfer of membranes from the endoplasmic reticulum to the golgi apparatus. This protein forms a ring-shaped 
homooligomer composed of six subunits. The yeast homolog Is CDC4e and it may play a role in spindle pole 
proliferation. 

- Yeast protein PAS1 . essential for peroxisome assembly and the related protein PAS1 from Pichia pastoris. 
Yeast protein AFG2. 

- Sulfolobus acidocaldarius protein SAV and Halobacterlum salinarium cdcH which may be part of a transduction 
pathway connecting light to cell division. 

[0182| Proteins containing a single AAA domain: 

- Escherichia coll and other bacteria ftsH (or hfIB) protein. FtsH is an ATP-dependent zinc metallopeptidase that 
seems to degrade the heat-shock sigma-32 factor. 

[01 83] It Is an Integra! membrane protein with a targe cytoplasmk; C-temiinal domain that contain both the AAA and 
the protease domains. 

- Yeast protein YME1 , a protein important for maintaining the integrity of the mitochondrial compartment. YME1 is 
also a zinc-dependent protease. 

- Yeast protein AFG3 (or YTA10). This protein also seems to contain a AAA domain folbwed by a zinc-dependent 
protease domain. 

[0184] Subunits from the regulatory complex of the 26S proteasome [6] which is Involved in the ATP-dependent 
degradatkx) of ubiquitinated proteins: 

a) Mammalian subunit 4 and homologs in other higher eukaryotes, in yeast (gene YTA5) and fisskjn yeast (gene 
mts2). 

b) Mammalian subunit 6 (TBP7) and homokjgs in other higher eukaryotes and in yeast (gene YTA2). 

c) Mammalian subunit 7 (MSS1 ) and homotogs in other higher eukaryotes and in yeast (gene CIM5 or YTA3). 

d) Mammalian subunit 8 (P45) and homok)9s in other higher eukaryotes and in yeast (SUG1 or CIM3 or TBY1) 
and fission yeast (gene letl). 

[0185] Other probable subunits such as human TBP1 which seems to influences HIV gene expression by interacting 
with the virus tat transactivator protein and yeast YTA1 and YTA6. 

- Yeast protein BCS1 , a mitochondrial protein essential for the expression of the Rieske iron-sulfur protein. 

- Yeast protein MSP1 . a protein involved in intramitochondrlal sorting of proteins. 

- Yeast protein PASS, and the corresponding proteins PASS from Pichia pastoris and PAY4 from Yarrowia lipolytka. 

- Mouse protein SKD1 and its fissk>n yeast homolog (SpAC2G11 .06). 

- Caenorhabditis elegans meiotk; spindle formatkxi protein mei-1 . 
Yeast protein SAP1 . 

- Yeast protein YTA7. 

- Mycobacterium leprae hypothetical protein A21 26A. 

[0186] It is proposed that, in general, the AAA domains In these proteins act as ATP-dependent protein clamps [5]. 
In additkxi to the ATP-blnding 'A* and B* motifs, which are located in the N-terminal half of this domain, there is a highly 
consented regkxi k)cated in the central part of the domain which was used to develop a signature pattern. 
Consensus pattern: lLIVMT|-x-{LIVMT]-[LIVMF]-x-{GATMC]-[STHNS)-x(4)-[UVM]-D-x-A-(LIFA]-x-R 

[1] Froehllch K.-U., Fries H.W., Ruediger M., Erdmann R.. Botstein D., Mecke D. J. Cell Bloi. 114:443-453(1991). 
[2] Erdmann R., WIebel FR. Flessau A., Rytka J., Beyer A, Froehllch K.-U., Kunau W.-H. Cell 64:499-510(1991). 
[3J Peters J.-M., W^lsh M.J., Franke W.W. EMBO J. 9:1757-1767(1990). 

I4J Kunau W.-K. Beyer A.. Goette K., Marzioch M., Saidowsky J., Skaletz-Rorowski A.. Wiebel FR Bkx;himie 75: 
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209-224(1993). 

[5] Confakxiteri P., Duguet M. BioEssays 1 7:639-650(1 995).[ 6] Hilt W., Wolf D.H. Trends Biochem. Scl. 21 :96-102 
(1996). 

2. ABC Membrane (ABC transporter transmembrane region). This family represents a unit of six transmembrane 
helices. Many members of the ABC transporter family (ABC tran)have two such regions. See also descriptions of 
ABC_Tran . below, and ABC2 membrane, above. 

3. (ABC Tran) 

ABC transporters family signature 

[0187] On the basis of sequence similarities a family of related ATP-bindingproteins has been characterized (1 to 5]. 
These proteins are associated with avariety of distinct bbtogical processes in both prokaryotes and eukaryotes, but a 
majority of them are involved in active transport of small hydrophilic molecules across the cytoplasmic membrane. All 
these proteins share a conserved domain of some two hundred amino acid residues, which includes an ATP-blnding 
site. These proteins are collectively known as ABC transporters. Proteins known to belong to this family are listed 
below (references are only provided for recently determined sequences). In prokaryotes: - Active transport systems 
components: alkylphosphonate uptake(phnC/phnK/ phnL); arabinose (araG); arginine (artP); dipeptide (dciAD;dppD/ 
dppF); ferric enterobactin (fepC); ferrichrome (fhuC); galactoside (mgIA); glutamine (glnQ); gfycerol-S-phosphate (ug- 
pC); glycine betaine/L-proline (proV); gtutamate/aspatate (gItL); histidine (hisP); iron(lll) (sfuC), iron(lll) dk:itrate (fecE); 
lactose (lacK); leuctne/isoleucineA/aline (braF/braG:llvF/livG); maltose (malK); molybdenum (modC); nickel (nikD/ 
nikE); oligopeptide (amiE/amiF;oppD/oppF); peptide (sapD/sapF); phosphate (pstB); putrescine (potG); ribose (rbsA); 
spermkJine/putrescine (potA); sulfate (cysA); vitamin B12 (btuD). - Hemolysin/leukotoxin export proteins hlyB, cyaB 
and IktB. - Colicin V export protein cvaB. - Lactococcin export protein IcnC [6]. - Lantibiotk: transport proteins nisT 
(nisin) and spaT (subtilin). - Extracellular proteases B and C export protein prtD. - Alkaline protease secretin protein 
aprD. - Beta-(1,2)-glucan export proteins chvA arKi ndvA. - Haemophilus influenzae capsule-polysaccharide export 
protein bexA. - Cytochrome c biogenesis proteins ccmA (also known as cycV and helA). - Polysialic acid transport 
protein kpsT - Cell diviskxi associated ftsE protein (functkxi unknown). - Copper processing protein nosF from Pseu- 
domonas stutzeri. - Nodulatkxi protein nodi from Rhizobium (function unknown). - Escherichia coll proteins cydC and 
cydD. - Subunit A of the ABC exciskxi nuclease (gene uvrA). - Erythromycin resistance protein from Staphykxxxcus 
epidermkiis (gene msrA). - Tytosin resistance protein from Streptomyces f radiae (gene tIrC) [7]. - Heterocyst differen- 
tiation protein (gene hetA) from Anabaena PCC 71 20. - Protein P29 f rom Mycoplasma hyorhlnis, a probable component 
of a high affinity transport system. - yhbG, a putative protein whose gene is linked with ntrA in many bacteria such as 
Escherichia coli, Klebsiella pneumoniae. Pseudomonas putkJa, Rhizobium mellloti and Thtobacillus ferrooxidans. - 
Escherichia coti and related bacteria hypothetical proteins yabJ, yadG, yagC, ybbA. ycjW, yddA, yehX, yejF, yheS, 
yhiG, yhiH, yjcW, yjjK. yoji. yrbF and ytfRIn eukaryotes: - The multidrug transporters (Mdr) (P-glycoprotein). a family 
of ck>sely related proteins which extrude a wide variety of drugs out of the cell (for a review see [8]). - Cystic fibrosis 
transmembrane conductance regulator (CFTR). which is most probably involved in the transport of chlorkJe bns. - 
Antigen peptide transporters 1 (TAP1 , PSF 1 , R!NG4, HAM-1 . mtpl ) and 2 (TAP2, PSF2, R1NG1 1 , HAM-2, mtp2), which 
are involved in the transport of antigens from the cytoplasm to a membrane-bound compartment for associatbn with 
MHC class I molecules. - 70 Kd peroxisomal membrane protein (PMP70). - ALDR a peroxisomal protein involved in 
X-linkedadrenoleukodystrophy [9]. - Sulfonylurea receptor [10], a putative subunit of the B^ell ATP-sensitive potassium 
channel. - Drosophila proteins white (w) and brown (bw), which are involved in the import of ommatidium screening 
pigments. - Fungal etongatbn factor 3 (EF-3). - Yeast STE6 which is responsible for the export of the a-factor pherom- 
one. - Yeast mitochondrial transporter ATM1. - Yeast MDL1 and MDL2. - Yeast SNQ2. - Yeast sporidesmin resistance 
protein (gene PDR5 or STS1 or YDR1). - Fissnn yeast heavy metal tolerance protein hmtl. This protein is probably 
involved in the transport of metal-bound phytochelatins. - Fissk>n yeast brefeldin A resistance protein (gene bfrl or 
hba2). - Fisson yeast leptomycin B resistance protein (gene pmdl). - mbpX, a hypothetical chloroplast protein from 
LivenA«)rt. - Prestalk-specific protein tagB from slime moki. This protein consists of two domains: a N4ermlnal subtllase 
catalytic domain and a C-terminal ABC transporter domain. As a signature pattern for this class of proteins, a conserved 
region whk:h is located between the 'A' and the 'B* motifs of the ATP-binding site was used. 

[0188] Consensus pattern: [LIVMFYCHSA]-ISAPGLVFYKQH]-G4DENQMW]-[KRQASPCLIMFW]-[KRNQSTAVM]- 
[KRACLVMHUVMFYPANKPHYHLIVMFW]- [SAGCLIVPHFYWHP}.{KRHP}-[LIVMFYWSTA] The ATP-binding re- 
gion is duplcated in araG, mdl, msrA, ttsA, tlrC, uvrA, yejF, Mdr's, CFTR, pmdl and in EF-3. In some of those proteins, 
the above pattern only detect one of the two copies of the domain. The proteins bekxiging to this family also contain 
one or two copies of the ATP-binding motifs *A' and B*. 
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[ 1) Higgins C.R, Hyde S.C., MImmack M.M., Gileadi U., Gill D.R., Gallagher MP. J. Bioenerg. Biomembr 22* 
571-592(1990). 

[ 2] Higgins C.R. Gallagher M.P.. MImmack M.M., Pearce S.R. BioEssays 8:111-116(1988). 
[ 3] Higgins C.R, Hiles I.D.. Salmond G.RC, Gill D.R.. Downie J.A., Evans I.J., Holland I.B., Gray L, Buckels S. 
5 D.. Bell A.W., Hermodson M.A. Nature 323:448-450(1986). 

[ 4) E>oolittle R.R. Johnson M.S.. Husain I., van Houten B.. Thomas D.C., Sancar A. Nature 323:461-453(1986). 
[ 5] Blight M.A.. Holland f.B. Mol. Mk:robk)l. 4:873-880(1990). 

[ 6] Stoddard G.W.. PetzelJ.R. van Belkum M. J., Kok J.. McKay LL. Appl. Environ. MIcrobk)!. 58:1952-1961(1992). 
[ 7] Rosteck PR. Jr., ReynokJs P. A., Hershberger C.L Gene 102:27-32(1991). 
10 [B] Gottesman M M.. Pastan I. J. Biol. Chem. 263:12163-12166(1988). 

[ 9] Valle D.. Gaertner J. Nature 361:682-683(1993). 

[10] Agullar-Bryan L. Nwhols C.G., Wechsler S.W., Clement J.P IV. Boyd A.E. IM. Gonzalez G.. Herrera-Sosa H.. 
Nguy K.. Bryar) J., Nelson D.A. Science 268:42^426(1995). 

IS 4. (ACBP) 

Acyl-CoA-binding protein signature 

[0189] Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and bng-chain acyl-CoA 
20 esters with very high affinity and may tunctk)n as an intracellular carrier of acyl-CoA esters {1]. ACBP is also known 
as diazepam binding inhibitor (DBI) or endozepine (EP) because of its ability to displace diazepam from the benzodi- 
azepine (BZD) recognitbn site k>cated on the GABA type A receptor. It is therefore possible that this protein also acts 
as a neuropeptkie to modulate the actkxi of the GABA receptor [2J.ACBP Is a highly conserved protein of about 90 
residues that has been so far found in vertebrates, insects and yeast. ACBP is also related to the N-terminal sectton 
2S of a probable transmembrane protein of unknown functbn whkrhhas been found in mammals. As a signature pattern, 
the regk>n that corresponds to residues 19 to 37 in mammalian ACBP was selected. 
Consensus pattern: P-[STA]-x-[DEN]-x-[UVMR]-x(2)-ILIVMPY]-Y-lGSTAhx-(RY]-K-Q-[STA](2)-x-G- 

[ 1] RoseT.M., Schultz E.R., Todaro G.J. Proc. Natl. Acad. Scl. U.S.A. 89:11287-11291(1992). 
30 [ 2] Costa E.. GukJotti A. Life Sci. 49:325-344(1 991 ). 

5. (AIRS) 

AIR synthase related proteins 

35 

[0190] This family includes Hydrogen expresskxi/formation protein HypE. AIR synthases. RGAM synthase and se- 
lenide, water dikinase. 

6. (AMP-binding) 

40 

Putative AMP-binding domain signature 

[0191] It has been shown (1 to 5] that a number of prokaryolk; and eukaryotk; enzymes whk;h all probably act via an 
ATP-dependent covalent bindirtgof AMP to their substrate, share a region of sequence similarity. These enzymes are: 

45 - Insects luciferase (lucrferrn 4-monooxygenase). Luciferase produces light by catalyzing the oxidation of luciferin In 
presence of ATP and molecular oxygen. - Alpha-aminoadipate reductase from yeast (gene LYS2). This enzyme cata- 
lyzes the activation of alpha-aminoadipate by ATP-dependent adenylation and the reduction of activated alpha-ami- 
noadipate by N ADPH. - Acetate-CoA llgase (acetyl-CoA synthetase), an enzyme that catalyzes the f onmatk)n of acety I- 
CoA from acetate and CoA. - Long-chain-fatty-ack*-CoA ligase. an enzyme that activates long-chain fatty acids for 

so both the synthesis of cellular lipids and their degradatkxi via beta-oxidatk)n.-4-coumarate"CoA ligase (4CL), a plant 
enzyme that catalyzes the formation of 4-coumarate-CoAf rom 4-coumarate and coenzyme A; the branchpoint reactkxis 
between general phenylpropanoid metabolism and pathways leading to varbus specific end products. - O-succlnyl- 
benzoic ackJ-CoA ligase (OSB-CoA synthetase) (gene menE) [6], a bacterial enzyme involved in the biosynthesis of 
menaquinone (vitamin K2). - 4-Chk)robenzoate-CoA ligase (EC 6.2.1. ) (4-CBA-CoA ligase) [7], a Pseudomonas 

ss enzyme involved in the degradation of 4-CBA. - Indoleacetate-lysine ligase (lAA-lysine synthetase) [8], an enzyme 
from Pseudomonas syringae that converts indoleacetate to lAA-lysine. - Bile ackJ-CoA ligase (gene baiB) from Eubac- 
terium strain VP1 12708 [4]. This enzyme catalyzes the ATP-dependent formation of a variety of C-24 bile ackJ^A.- 
Crotonobetaine/camitine-CoA ligase (EC 6.3.2.-) from Escherk:hia coli (gene caiC). - L-(alpha-aminoadipyl)-L-cystei- 
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nyl-D-valine synthetase (ACV synthetase) from various fungi (gene acvA or pcbAB). This enzyme catalyzes the first 
step in the biosynthesis of penicillin and cephalosporin, the formation of ACV from the constituent amino acids. The 
amino acids seem to be activated by adenytatbn. It is a protein of around 3700 amino acids that contains three related 
domains of about 1000 amino acids. - Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme cata- 

s lyzes the first step in the biosynthesis of the cyclic anttoiotic gramicidin S, the ATP-dependent racemlzation of pheny- 
lalanine - Tyrocidine synthetase I (gene tycA) from Bacillus brevis. The reaction carried out by tycA is identical to that 
catalyzed by grsA - Gramicidin S synthetase II (gene grsB) from Bacillus brevis. This enzyme is a multifunctional protein 
that activates and polymerizes proline, valine, ornithine and leucine. GrsB consists of four related domains. - Entero- 
bactin synthetase components E (gene entE) and F (gene entF) from Escherichia coli. These two enzymes are involved 

10 in the ATP-dependent activation of respectively 2,3-dihydfoxybenzoate and serine during enterobactin (enterochelln) 
biosynthesis. - Cyclic peptide antibiotic surfactin synthase subunits 1, 2 and 3 from Bacillus subtilis. Subunits 1 and 2 
contains three related domains while subunit 3 only contains a single domain. - HC4oxin synthetase (gene HTS1 ) from 
Cochliobolus carbonum. This enzyme activates the four amino acids (Pro, L-Ala. D-Ala and 2-amino-9.10-epoxi-8- 
oxodecanoic acid) that make up HC-toxin, a cyclic tetrapeptide. HTS1 consists of four related domains.There are also 

IS some proteins, whose exact function is not yet known, but whk:hare, very probably, also AMP-binding enzymes. These 
proteins are: - ORA (octapeptide-repeat antigen), a Plasmodium fateiparum protein whose f unctk>n is not known but 
whk:h shows a high degree of similarity with the above proteins. - AngR, a Vibrio anguillarum protein. AngR is thought 
to be a transcriptk)nal activator which modulates the anguibactin (an iron-binding siderophore) biosynthesis gene clus- 
ter operon. But it is believed [9], that angR is not a DNA-binding protein, but rather an enzyme involved in the biosyn- 

20 thesis of anguibactin. This conclusion is based on three facts: the presence of the AMP-binding domain; the size of 
angR (1048 residues), which is far bigger than any bacterial transcriptwnal protein; and the presence of a probable S- 
acyl thioesterase immediately downstream of angR. - A hypothetfcal protein in mmsB S'region in Pseudomonas aer- 
uginosa. - Escherbhia coli hypothetk:al protein ydiD. - Yeast hypothetical protein YBR041 w. - Yeast hypothetical protein 
YBR222C. - Yeast hypothetk:al protein YER147c.AII these proteins contain a highly consented region very rich in gty- 

2S cine, serine, and threonine which is foltowed by a conserved lysine. A parallel can be drawn between this type of 
domain and the G-x(4)-G-K-[ST] ATP-/GTP-binding 'P-loop* domain or the protein kinases G-x-G-x(2)-[SG]-x(10,20)- 
KATP-binding domains. 

[01921 Consensus pattern: [LIVMFYhx(2)-ISTGHSTAG]-G-[STHSTEI]-[SGhx-[PASLIVM]-(KR] In a majority of cas- 
es the reskiue that folbws the Lys at the end of the pattern is a Gly. 

30 

1 1] Toh H. Protein Seq. Data Anal. 4:111-117(1991). 
{ 2] Smith D.J., Eari A. J., Turner G. EMBO J. 9:2743-2750(1990). 
[ 3] Schroeder J. Nucleic Acids Res. 17:460-460(1989). 
[ 4] Mallonee D.H., Adams J.L, Hylemon RB. J. Baclertol. 174:2065-2071(1992), 
35 [ 5\ Turgay K., Krause M., Marahiel M A. Mol. MicrobwI. 6:529-546(1992). 

[ 6] Driscoll J.R., Taber H.W. J. Bacteriol. 174:5063-5071(1992). 

[7] Babbitt PC, Kenyon G.L., Matin B.M., Charest H.. Sylvestre M., Schollen J.D.. Chang K.-H.. Liang P-K, 

Dunaway-Mariano D. Bkx^hemistry 31:5594-5604(1992). 

[ 8] Farrell D.H., Mikesell P., Actis LA. Crosa J.H. Gene 86:45-51(1990). 

40 

7, AP2 donr^ln 

[0193] This 60 amino ackJ reskiue domain can bind to DNA [1]. This domain is plant specific. Members of this family 
are suggested to be related to pyridoxai phosphate-binding domains such as found in aminotran 2 [3]. AP2 domains 
45 are also described in Jofuku et aL, copending U.S. Patent applk:ations 08/700,152, 08/879,827. 08/912.272, 
09/026,039. 

[11 Ohme-takagi M. Shinshi H; Plant Cell 1995;7:173-182, 
[2] Weigel D; Plant Cell 1995;7:388-389. 
so [3] Mushegian AR, Koonin EV; Genetk:s 1996;144:817-828. 

a ARID 

[0194] The ARID domain is an AT-Rich Interactnn domain sharing structural homology to DNA repMcaX^on and repair 
ss nucleases and polymerases. 

[1) Herrscher RF, Kaplan MH, Leisz DL, Das C, Scheuermann R, Tucker PW; Genes Dev 1995;9:3067-3082. 
[2] Yuan YC, Whitson RH. Uu Q. Itakura K. Chen y; Nat Struct Biol 1998;5:959-964. 
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9. (ATP synt) 

ATP synthase gamma subunit signature 

[0195] ATP synthase (proton-translocating ATPase) (EC 3.6.1.34) [1 .2] is a componentof the cytoplasmic membrane 
of eubacteria. the Inner membrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex 
is composed of an ollgomeric transmembrane sector, called CF(0). and a catalytic core, called coupling factor CF(1 ). 
The former acts as a proton channel; the latter is composed of five subunits. alpha, beta, gamma, delta and epsllon. 
Subunit gamma Is believed to be important In regulating ATPase activity and the flow of protons through the CF(0) 
complex. The best consen/ed region of the gamma subunit [3] is its C-lerminus which seems to be essential for as- 
sembly and catalysis. As a signature pattern to detect ATPase gamma subunits, a1 4 residue conserved segment where 
the last amino acid is found one to three residues from the C4erminal extremity was used. 

[0196] Consensus pattern: [IV]-T-x-E-x(2HDE]-x(3)-G-A-x-[SAKR]- Note: Pea chlofoplast gamma and two Bacillus 
species gamma subunits are not detected by this motif. 

[ 1] Futai M.. Noumi T, Maeda M. Annu. Rev Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3] Miki J., Maeda M.. Mukohata Y, Futai M. FEBS Lett. 232:221-226(1988). 

10. (ATP Synt A) 
Synthase a subunit signature 

[0197] ATP synthase (proton4ranslocatlng ATPase) (EC 3.6.1.34) [1 ,2] Is a component of the cytoplasmic membrane 
of eubacteria. the inner membrane of mitochondria.and the thylakoid membrane of chloroplasts. The ATPase complex 
Is composed of an ollgomeric transmembrane sector, called CF(0), which acts as a prc^on channel, and a catalytk: 
core, termed coupling factor CF(1 ).The CF(0) a subunit, also called protein 6, is a key component of the proton channel; 
It may play a direct role in transk>cating protons across the membrane. It is a highly hydrophobic protein that has been 
predicted to contain 8 transmembrane regions [3].Sequence comparison of a subun its from all available sources reveals 
very few conserved regions. The best consented region is located In what is predicted to be the fifth transmembrane 
domain. This region contains three perfectly consented residues: an arginlne, a leucine and an asparaglne. Mutagen- 
esis experiments of ATPase activity. This regk)n was selected as a signature pattern. 

Consensus pattern: [STAGN]-x-[STAGHLI VMF]-R-L-x-[SAGV]-N-[LI VMT] [R Is important for proton transkx»tion] 

[ 1] Futai M.. Noumi T, Maeda M. Annu. Rev Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physbl. Rev 68:177-231(1988). 

[ 3] Lewis M.L, Chang J.A., Simoni R.D. J. Bbl. Chem. 265:10541-10550(1990). 
[ 4] Cain B.D., Simoni R.D. J. Biol. Chem. 264:3292-3300(1989). 

11. ATP synthases 

[0198] Part of the CF(0) (base unit) of the ATP synthase. The base unit is thought to translocate protons through 
membrane (Inner membrane in mitochondria, thylakoid membrane in plants, cytoplasmb membrane in bacteria). The 
B subunits are thought to interact with the stalk of the CF(1 ) subunits. 

12. (ATP synt C) 

ATP synthase c subunit signature 

[0199] ATP synthase (proton-transtocating ATPase) [1 ,2] is a component of the cytoplasmic membrane of eubacteria, 
the Inner membrane of mitochondria.and the thylakoid membrane of chtoroplasts. The ATPase complex Is composed 
of an oligomeric transn^mbrane sector, called CF(0), which acts as a proton channel, and a catalytk: core, termed 
coupling factor CF(1).The CF(0) c subunit (also called prc^ein 9. proteolipid, or subunit III) [3.4]ls a highly hydrophobic 
protein of about 8 Kd which has been implfcated In the proton-conducting activity of ATPase. Structurally subunit c 
consist of two long terminal hydrophobic regkxis. which probably span the membrane, and a central hydrophilk: region. 
N.N'-dlcyclohexytearbodiimkle (DCCD) can bind covalently to subunit c and thereby abolish the ATPase activity. DCCD 
binds to a specific glutannate or aspartate residue whk:h is located in the middle ofthe second hydrophobic regkxi near 
the C-termlnus of the protein. A signature pattem whk:h includes the [X)CD-bindlng residue was derived. 
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[0200] Consensus pattern: [GSTA]-R-[NQ]-P-x(1 OHLI VMFYWl(2)-x(3)-{LI VMFYW]-x-[DE] [D or E binds DCCD] 

[ 1] Futai M.. Noumi T. Maeda M. Annu. Rev. Blochem. 58:111-136(1989). 
[ 2) Senfor A.E, Physiol. Rev. 68:177-231(1988). 
5 [ 3] Ivaschenko A.T.. Karpenyuk T.A., Ponomarenko S.V. Biokhimlia 56:406-419(1991). 

1 4] Recipon H.. Perasso R.. Adoutte A, Quetier F. J. Mol. Evol. 34:292-303(1992). 

13. (ATPsyntDE) 

10 ATP synthase, Detta/Epsiton chain 

[0201] Part of the ATP synthase CF(1 ). These subunits are part of the head unit of the ATP synthase. The subunits 
are called delta and epsilon in human and metozoan species but in bacterial species the delta (D) subunit is theequiv- 
alent to the Oligomycin sensitive subunit (OSCP) in metozoans. 

IS 

14. (ATPsyntab) 

ATP synthase alpha and beta subunits signature 

20 [0202] ATP synthase (proton-transk)cating ATPase) [1 .2] is a component of the cytoplasmic membrane of eubacteria, 
the inner membrane of mitochondria,and the thylakoid membrane of chtoroplasts. The ATPase complex is composed 
of an oligomeric transmembrane sector, called CF(0), and a catalytic core, called coupling factor CF(1 ). The former 
acts as a proton channel; the latter is composed of five subunits, alpha, beta, gamma, delta and epsilon. The sequences 
of subunits alpha and beta are related and both contain a nucleotide-binding site for ATP and ADP. The beta chain has 

25 catalytic activity, while the alpha chain is a regulatory subunit. Vacuolar ATPases [3] (V-ATPases) are responsible for 
acidifying a variety of intracellular compartments in eukaryotic cells. Like F-ATPases, they are oligomeric complexes 
of a transmembrane and a catalytic sector. The sequenceof the largest subunit of the catalytic sector (70 Kd) is related 
to that ofF-ATPase beta subunit, while a 60 Kd subunit, from the same sector, is related to the F-ATPases alpha subunit 
[4].Archaebacterial membrane-associated ATPases are composed of three subunits.The alpha chain is related to F- 

30 ATPases beta chain and the beta chain is related to F-ATPases alpha chain [4]. A protein highly similar to F-ATPase 
beta subunits is found [5] in some bacterial apparatus involved in a specialized protein export pathway that proceeds 
without signal peptide cleavage. This protein is knbwn as flil In Bacillus and Salmonella. Spa47 (mxiB) in Shigella 
flexneri, HrpB6 in Xanthomonas campestris and yscN in Yersinia virulence plasmids.To detect these ATPase subunits, 
a segment of ten amino-acid rescues, containing two consented serines, as a signature pattern was selected. The 

35 first serine seems to be important for catalysis - in the ATPase alpha chain at least - as its mutagenesis causes catalytic 
impairment. 

[0203] Consensus pattem: P-ISAP]-[L1 V|-[DNH]-x(3)-&-x-S [The first S is a putative active site reskJue] 

[ 1] Futai M., ftoumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
40 [ 2] Senbr A.E. Physbl. Rev. 68:177-231(1988). 

[ 3] Nelson N. J. Bwenerg. Bkxnembr. 21:553-571(1989). 

[ 4] Gogarten J.P, Kibak H.. Dittrich P, Taiz L, Bowman E.J., Bowman B.J.. htonolson M.F, Poole R.J„ Date T. 
Oshima T, Konishi J.. Denda K.. Yoshkia Proc. Natl. Acad. Sci. U.S.A. 86:6661-665(1989). 
[ 5] Dreyfus G., Williams A.W.. Kawagishi 1., MacNab R.M. J. Bacterk)!. 175:3131-3138(1993). 

45 

15. (ATPsynt ab C) 

ATP synthase ab C terminal. 

so [0204] Number of members: 190 

[1] Abrahams JP, Leslie AG, Lutter R, V^lker JE; Structure at 2.8 A resolution of F1 -ATPase from bovine heart mito- 
chondria.' Nature 1994;370:621-628. 

16. (A deaminase) 

55 

Adenosine and AMP deaminase signature 

[0205] Adenosine deaminase catalyzes the hydrotytc deaminatkxi ofadenosine into Inosine. AMP deaminase cat- 
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alyzes the hydrolytic deamlnation of AMP into IMP. It has been shown [1] that these two types of enzymes share three 
regions of sequence stmllaritles: these regions are centered on residues which are proposed to play an important role 
In the catalytic mechanism of these two enzymes. One of these regions, containing two consented aspartic acid residues 
that are potential active site residues was selected. 
5 Consensus pattem: [SAHLIVMHNGS]-[STA]-D-D-P [The two D's are putative active site residues] 
[1] Chang Z., Nygaard P.. Chinault A.C.. Kellems R.E. Biochemistry 30:2273-2280(1 9gi ). 

17. (Acetyttransf) 

10 Acetyttransferase (GNAT) family. 

[0206] This family contains proteins with N-acetyttransferase f urK:tions. 
[1] Neuwald AF, Landsman D; Trends Biochem Sci 1997;22:154-155. 

15 18. (AconitaseC) 

Aconitase family signature 

[0207] Aconitase (aconitate hydratase) (EC 4.2.1 .3) [1 ] is the enzyme from the tricarboxylic acid cycle that catalyzes 
^ the reversible isomerization of citrate and isocitrate. Cis-aconrtate is formed as an intermediary product during the 
course of the reaction. In eukaryotes two isozymes of aconitase are known to exist one found in the mitochondrial 
matrix and the other found in the cytoplasm. Aconitase, in its active form, contains a 4Fe-4S iron-sulfur cluster; three 
cysteine residues have been shown to be ligands of the 4Fe-4S cluster.lt has been shown that the aconitase family 
also contains the foltowingproteins: - Iron-responsive element binding protein (IRE-BP). IRE-BP is a cytosolic protein 
25 that binds to iron-responsive elements (IREs). IREs are stem-loop structures found in the 5'UTR of ferritin, and delta 
aminolevulinic acid synthase mRNAs, and in the 3'UTR of transfemn receptor mRNA. IRE-BP also express aconitase 
activity. - 3-isopropylmalate dehydratase (EC 4.2.1.33) (isopropylmalate isomerase), the enzyme that catalyzes the 
second step In the biosynthesis of leucine. - Honrx>aconitase (EC 4.2.1. 36) (homoaconitate hydratase), an enzyme that 
participates in the alpha-amtnoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 
30 moisocitric acid. - Esherichia coll protein ybhJ.As a signature for proteins from the aconitase family, two conserved 
regions that contain the three cysteine ligands of the 4F&4Scluster were selected. 

Consensus pattem: ^LIVM^x(2)-[GSACIVM]-x-(LIVl^GTIV^[STP]-C-x(0,1)-T-N-[GSTANI]-x(4)-[^ [C binds the 
iron-sulfur center] 

Consensus pattem: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-ILIMPTA]-C-[LIMV]-[GA] [The two C's bind the iron-sul- 
35 fur center] 

[ 1] Gmer M.J., Artymiuk RJ., Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 

19. (Acyl-CoAdh) 

40 Acyl-CoA dehydrogenases signatures 

[0208] Acyl-CoA dehydrogenases [1 ,2.3] are enzymes that catalyze the alpha. beta-dehydrogenatk)n of acyl-CoA 
esters and transfer electrons to ETF. the electron transfer protein. Acyl-CoA dehydrogenases are FAD f lavoproteins. 
This family cuaently includes: - Five eukaryotk: isozymes that catalyze the first step of the beta-oxidation cycles for 

45 fatty acids with varbus chain lengths. These are short (SCAD) (EC 1.3.99.2). medium (MCAD) (EC 1.3.99.3). tong 
(LCAD) (EC 1.3.99.13). very-long (VLCAD) and short/branched (SBCAD) chain acyl-CoA dehydrogenases. These 
enzymes are kxated in the mitochondrton. They are all honDotetrameric proteins of about 400 amino ackl reskJues 
except VLCAD whk:h is a dimer and wh«h contains, in its mature form, about 600 residues. - Glutaryl-CoA dehydro- 
genase (EC 1.3.99.7) (GCDH), whfch is involved in the catabolism of lysine, hydroxylysine and tryptophan. - Isovaleryl- 

50 CoA dehydrogenase (EC 1.3.99.10) (I VD), involved in the catabolism of leucine. - AcyUoA dehydrogenases acsA and 
mmgC from Bacillus subtilis. - Butyryl-CoA dehydrogenase (EC 1.3.99.2) from Clostridium acetobutylbum. - Es- 
cherk:hia coli protein caiA [4]. - Escherichia coli protein aldB. Two conserved regions were selected as signature pat- 
terns. The first is kx:ated in the center of these enzymes, the second in the C-termlnal sectk)n. 
Consensus pattem: [GAC]-[LI VMJ-[ST]-E-x(2)-IGSAN]^-[ST>D-x(2)-[GSA] 

55 Consensus pattem: [QDE]-x(2)-G-(GSJ-x-G^UVMFY]-x(2HDEN}-x(4)-[KR]-x(3)-[DEN] 

[ 1] Tanaka K.. Ikeda, Matsubara Y, Hyman D.B. Enzyme 38:91-107(1987). 

[ 2) Matsubara Y, Indo Y, Naito E., Ozasa H.. Glassberg R., NAocWey J., Ikeda Y, Kraus J., Tanaka K. J. Btol. 
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Chem. 264:16321-16331(1989), 

[ 3] Aoyama T. Ueno I., Kamijo T. Hashimoto T. J. Biol. Chem. 269:19088-19094(1994). 

[ 4] Eichler K., Bourgis F.. Buchet A.. Kleber H.-R. Mandrand-Berthelol M.-A. Mol. Microbiol. 13:775-786(1994). 

5 20. (Acyltransf) 

Acyl transferase domain 

[0209] Number of members: 161 
10 [1] Serre L. Verbree EC, Dauter Z, Stuilje AR, Derewenda ZS; Medline: 95286570 The Escherichia coli malonyl-CoA: 
acyl carrier protein transacylase at 1 .5-A resolution. Crystal stmcture of a fatly acid synthase component." J Biol Chem 
1995:270:12961-12964. 

21. Acylphosphatase signatures 

IS 

[0210] Acylphosphatase (EC 3.6.1.7) (1 ,2] catalyzes the hydrolysis of various acylphosphate carboxyl-phosphate 
bonds such as cartamyl phosphate, succinylphosphate, 1 ,3-diphosphoglycerate, etc. The physfological role of this 
enzymels not yet clear. Acylphosphatase is a small protein of around 100 aminoacid residues. There are two known 
isozymes. One seems to be specific to muscular tissues, the other, called 'organ-common type', is found in many 

^ different tissues.While acylphosphatase have been so far only characterized in vertebrates,there are a number of 
bacterial and archebacterial hypothetical proteins that are highly similar to that enzyme and that probably possess the 
same activlty.These proteins are: - Escherichia coli hypothetical protein yccX. - Bacillus subtilis hypothetical protein 
yfIL - Archaeoglobus fulgidus hypothetical protein AF0818. Two conserved regions were selected as signature pat- 
tems. The first is located in the N-termlnal section, while the second Is found in the central part ofthe protein sequence. 

2S Consensus pattern: [LIV]-x-G-x-V-Q-G-V-x4FM]-R 

Consensus pattern: G-(FYWHAVCHKRQAM]-N-x(3)-G-x-V-x(5)-G 

[ 1] Stefan! M., Ramponi G. Life Chem. Rep. 12:271-301(1995). 

[ 2] Stefani M., Taddel N., Ramponi G. Cell. MoL Life Sci. 53:141-151 (1997). 

30 

22. (Adap comp sub) 

Clathrin adaptor complexes medium chain signatures. 

3S [0211] Clathrin coated vesicles (CCV) mediate Intracellular membrane traffic such asreceptor mediated endocytosls. 
In addition to clathrin, the CCV are composed of a number of other components including oligomerb complexes which 
are knownas adaptor or clathrin assembly proteins (AP) complexes [1 ]. The adaptor complexes are believed to interact 
with the cytoplasmic tails erf membrane proteins, leading to their selectbn and concentratbn. In mammals two type of 
adaptor complexes are known: AP-1 which is associated with the Golgl complex and AP-2 which Is associated with 

40 the plasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains - the adaptlns - 
(gamma and beta' In AP-1 ; alpha and beta in AP-2); a medium chain (AP47 in AP-1 ; AP50 inAP-2) and a small chain 
(API 9 In AP-1 ; AP17 in AP-2). The medium chains of AP-1 and AP-2 are evolutk>nary related proteins of about 50 Kd. 
Homologs of AP47 and AP50 have also been found in Caenorhabditis elegans (genes unc-1 01 and ap50) [2] and yeast 
(gene APM1 or YAP54) [3].Some more divergent, but clearly evolutbnary related proteins have also been found in 

45 yeast: APM2 and YBR288c., Two consen/ed reg»ns were selected as signature pattems. one kx:ated in the N-terminal 
region, the other from the central sectwn of these proteins. 

Consensus pattern: [IVT]-[GSP]-W-R-x(2.3)-[GAD]-x(2)-IHY]-x(2)-N-x- [UVMAFY](3)-CKUVM]-[LIVMT]-E 
Consensus pattern: [UVl-x-F-l-P-P-x-G-x-[LIVMFYl-x-L-x(2)-Y 

so 1 1] Pearse B.M., Robinson M.S. Annu. Rev. Cell Bk)l. 6:151-171(1990). 

[ 2] Lee J.. Jongeward G.D.. Sternberg P.W. Genes Dev. 8:60-73(1994). 

[ 3] Nakayama Y., Goebl M.. CBrine G.B., Lemmon S.. Pingchang C.E., Kirchhausen T Eur. J. Biochem. 202: 
569-574(1991). 

55 
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23. (Adenytsucc synt) 
Adenylosuccinate synthetase signatures 

s [021 2] Adenylosuccinate synthetase (EC 6.3.4.4 ) [1 ] plays an Important role In purrnebiosynthesis, by catalyzing the 
GTP-dependent conversion of IMP and aspartic acid to AMP. Adenylosuccinate synthetase has been characterized 
from various sources ranging from Escherichia coti (gene purA) to vertebrate tissues. Invertebrates, two isozymes are 
present - one involved in purine biosynthesis and the other in the purine nucleotide cycle. Two conserved regions were 
selected as signature patterns. The first one is a perfectly consented octapeptide located in the N-terminal section and 

10 which is involved in GTP-binding [2]. The second one Includes a lysine residue known [2] to be essential for the enzyme's 
activity. 

Consensus pattem: O-W-G-D-E-G-K-G 

Consensus pattem: G-l-(GR]-P-x-Y-x(2)-K-x(2)-R [K is the active site residue] 

IS [ 1] Wiesmueller L. Wittbrodt J.. Noegel AA, Schleicher M. J. Biol. Chem. 266:2480-2485(1991). 

[ 2) Silva M.M., Poland B.W., Hoffman C.R.. Fromm H.J.. Honzatko R.B. J. Mol Biol. 254:431-446(1995). 
[ 3] Bouyoub A., Barbier G., Forterre P., Labedan B. 2.3.CO:2-'J. Mol. Biol. 261:144-154(19961 

24. (AdoHcyase) 

20 

S-adenosyl-L-homocysteine hydrolase signatures 

[021 3] S-adenosyl-L-homocysteine hydrolase (EC 3.3.1.1) (AdoHcyase) is an enzyme of the activated methyl cycle, 
responsible for the reversible hydratation of S-adenosyl-L-homocysteine into adenosine and homocysteine. AdoHc- 
2S yase is anubiqullous enzyme which binds and requires NAD+ as a cofactor. AdoHcyase is a highly consented protein 
[1]of about 430to470amlnoacids. Two highly consented regions were selected as signature pattems. The first pattem 
is located in the N-terminal section; the second is derived from aglycine-rich region in the central part of AdoHcyase; 
a region tfiought to be Involved in NAD-binding. 

Consensus pattem: [GSAHCS]-N-x-(FYLM]-S-IST|-[QA]-[DEN]-x-[AV]-[AT|-[AD]-[AC]-[LIVMCG] 
30 Consensus pattem: [GAHKS]-x(3)-[LIV]-x-G-[FY]-G-x-[VC]-G-[KRL]-G-x-[ASC] 

[ 1] Sganga M.W., AksamIt R.R., Cantoni G.L. Bauer C.E. Proc. Natl. Acad. Sci. U.S.A. 89:6328-6332(1992). 

25. AhpC^SA family 

3S [0214] This family contains proteins related to alkyi hydroperoxide reductaseComment: (AhpC) and thiol specific 
antioxidant (TSA). 

[1] Chae HZ, Robison K, Poole LB. Church G. Storz G. Rhee SG. Proc Natl Acad Sci U S A 1994;91:7017-7021 

26. (Aldose epim) 

40 

[021 5] Aldose 1 -epimerase putative active site Aldose 1 -epimerase (EC 5. 1 .3. 3) (mutarotase) is the enzyme respon- 
sible for the anomeric interconverslon of D-glucose and other aldoses between their alpha- and beta-forms. The se- 
quence of mutarotase from two bacteria, Acinetobacter calcoaceticus and Streptococcus thermophilus is available [1 ]. 
It has also been shown that, on the basis of extensive sequence similarities, a mutarotase domain seem to be present 
45 in the C-terminal half of the fungal GAL10 protein which encodes, in the N-terminal part, for UDP-glucose 4-epimerase. 
The best conserved region in the sequence of mutarotase is centered around a consen/ed histidine residue which may 
be involved in the catalytic mechanism. 
Consensus pattem: [IMS]-x-T-N-H-x-Y-[FW]-N-[Ll] 

[ 1] Poolman B., Royer T.J.. Mainzer S.E.. Schmidt B.F. J. Bacterlol. 172:4037-4047(1990). 

so 

27. (AlkA DNA repair) 

Alkylbase DNA glycosidases alkA family signature 

ss [0216] Alkylbase DNA glycosidases [1 ] are DNA repair enzymes that hydrolyzes the deoxyrrbose N-glycoskiic bond 
to excise various alkylated bases from a damaged DNA polymer. In Escherfchia coli there are two alkylbase DNA 
glycosklases: one (gene tag)which is constitutively expressed and which is specific for the removal of 3-methyladenine 
(EC 3.2.2.20), and one (gene alkA) whk:h is induced during adaptatbn to alkylatk>n and whk:h can remove a variety 
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of alkylation products (EC 3.2.2.21 ). Tag and alkA do not share any region of sequence similarity. In yeast there is an 
alkylbase DNA gtycosidase (gene MAQ1) [2.3]. which can remove 3-methyladenine or 7-melhyfadenine and which is 
structurally related to alkA. MAG and alkA are both proteins of about 300 amino ackj reskJues. While the C- and N- 
terminal ends appear to be unrelated, there is a central regkxi of about 1 30 reskJues which is well conserved. A portion 
s of this region has been selected as a signature pattem . 

Consensus pattern: G-l-G-x-W-ISTHAVI-x-[UVMFY](2)-x4LIVM]-x(8)-[MF]-x{2)-[ED]-D 

[1] LindahIT, Sedgwfck B. Annu. Rev. Biochem. 57:133-157(1988). 
[ 2] Berdal K.G.. Bpras M.. Bjelland S., Seeberg E.C. EMBO J. 9:4563-4568(1990). 
10 1 3] Chen J.. Derfler B., Samson L. EMBO J. 9:4569-4575(1990). 

28. Ammonium transporters signature 

[0217] A number of proteins involved in the transport of amrtKjnium ions across amembrane as well as some yet 
IS uncharacterized proteins have been shown [1,2] to be evolutk)nary related. These proteins are: - Yeast ammonium 
transporters MEP1 , MEP2 and MEP3. - ArabWopsis thallana high affinity ammonium transporter (gene AMT1 ). - Co- 
rynebacterium glutambum ammonium and methylammonlum transport system. - Escherichia coH putative amnnonium 
transporter amtB. - Bacillus subtills nrgA. - Mycobacterium tubercutosis hypothetbal protein MtCY338.09c. - Syne- 
chocystis strain PCC 6803 hypothetcal proteins sllOI 08. sll0537 and sill 01 7 - Methanococcus jannasch ii hypothetkal 
20 proteins MJ0058 and MJ1343. - Caenorhabditis etegans hypothetbal proteins C05E11.4, F49E11.3 and Ml 95.3. As 
expected by their transport function, these proteins are highly hydrophobe and seem to contain from 10 to 12 trans- 
membrane domains. The best conserved regton seems to be located in the fifth (or sixth) transmembrane region and 
Is used as a signature pattem. 

Consensus pattem: D-[FYWS]-A-G-[GSC]-x(2)4IV]-x(3)-[SAG](2)-x(2)-[SAG]-[LIVMF]-x(3)-[LIVMFY^^ 

2S R 

[ 1] NinnemannO.. Jannlaux J.-C., Frommer W.B. EMBO J. 13:3464-3471(1994). 

[ 2] Siewe R.M., Weil B., Burkovski A., Elkmanns B.J., Eikmanns M.. Kraemer R. J. Biol. Chem. 271:5398-5403 
(1996). 

30 1 3] Saier M.H. Jr. Adv. Mbrobbl. Physk)!. 40:61-136(1998). 

29. (Arch_histone) 
CBF/NF-Y subunlts signatures 

35 

[0218] Diverse DNA binding proteins are known to bind the CCAAT box, a common cis-acting element found in the 
promoter and enhancer regions of a large number of genes In eukaryotes. Amongst these proteins is one known as 
the CCAAT-binding factor (CBF) or NF-Y [1]. CBF Is a heleromeric transcriptkxi factor that consists of two different 
components both needed for DNA>binding. The HAP protein complex of yeast binds to the upstream activatk)n site of 

40 cytochrome C iso-1 gene (CYC1) as well as other genes Involved in mitochondrial electron transport and activates 
their expresswn. It also recognizes the sequence CCAAT and is structurally and evolutionary related to CBF The first 
subunit of CBF, known as CBF-A or NF-YB in vertebrates, HAP3 In budding yeast and as php3 In fission yeast, is a 
protein of 116 to 210 amino-ackJ reskiues whk:h contains a highly consented central domain of about 90residues. This 
domain seems to be Involved In DNA-binding; a signature pattem had been devebped from its central part. The second 

45 subunit of CBF, known as CBF-B or NF-YA in vertebrates, HAP2 in budding yeast and php2 in fission yeast, is a protein 
of 265 to 350 amino-ackJ residues which contains a highly consented region of about 60 residues. This region, called 
the 'essential core' [2], seems to consist of two subdomains: an N-lemiinal subunit-assoclation domain and a C-terminal 
DNA recognitwn domain. A signature pattem has been devekjped from a sectkxi off the subunit-assoclation domain. 
Consensus pattem: C-V-S-E-x-l-S-F-[LIVM]-T-(SGl-E-A-[SC]-IDE]-[KRQ]-C- 

SO Consensus pattern: Y-V-N-A-K-Q-Y-x-R-l-L-K-R-R-x-A-R-A-K-L-E- 

[ 1] U X.-Y, MantovanI R., Hoofl van Huijsduijnen R., Andre I.. Benolst C, Mathis D. Nuciek; Acids Res. 20: 
1087-1091(1992). 

[2]Olesen J.T.. Flkes J.D., Guarente L Mol. Cell. Biol. 11:611-619(1991). 

55 

30. Argininosuccinate synthase signatures 

[0219] Argininosuccinate synthase (EC 6.3.4.5> (AS) Is a urea cycle enzyme that catalyzes the penultimate step in 
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arginine biosynthesis: the ATP-dependent ligation of citrultine to aspartate to form argininosucclnate. AMP andpyro- 
phosphate [1,2]. In humans, a defect In the AS gene causes crtrulllnemia. a genetic disease characterized by severe 
vomiting spells and mental retardation.AS is a homotetrameric enzyme of chains of about 400 amino-acid residues. 
Anarginine seems to be important for the enzyme's catalytic mechanism. The sequences of AS from varbus prokary- 
5 otes, archaebacteria and eukaryotes show significant similarity. Two signature patterns have been selected for AS. 
The first is a highly conserved stretch of nine residues located in the N-terminal extremity of these enzymes, the second 
is derived from a conserved region which contains one of the conserved arginine residues. 
Consensus pattern: [AS]-(FY]-S-G-G-[LV]-D-T-[ST1- 
Consensus pattern: G-x-T-x-K-G-N-D-x{2)-R-F- 

10 

1 1J van Vliet R, Crabeel M., Boyen A., Tricot C. Stalon V.. Falmagne P., Nakamura Y, Baumberg S.. Glansdorff 
N. Gene 95:99-104(1990). 

[ 2] Morris C.J., Reeve J.N. J. Bacteriol. 170:3125-3130(1988). 

IS 31 . ArmadillG^ta-catenin-like repeats 

[0220] Approx. 40 amino acid repeat. Tandem repeats form super-helix of helices that is proposed to mediate inter- 
actkxi of beta-catenin with its ligands. CAUTION: This family does not contain all known armadillo repeats. 

20 [1] Huber AH, Nelson WJ, Weis Wl, Cell 1997:90:871^2. 

[2] Gumbiner BM, Curr Opin Cell Biol 1995;7:634-640. 

[3] Cavallo R. Rubenstein 0. Peifer M, Curr Opin Genet Dev 1997;7:459-466. 

[4] Su LK. Vogelstein B. Ktnzler KW. Science 1993;262:1734-1737. 

[5] Masiarz FR, Munemitsu S. Polakis P Science 1993;262:1731-1734 
2S [6] Peifer M. Wieschaus E. Cell 1990;63:1167-1176. 

32. (Asn Synthase) 
Asparagine synthase 

30 

[0221] This family is always found associated with GATase 2. Members of this family catalyse the conversion of 
aspartate to asparagine. 

33. Asparaginase_2 

3S 

Asparaginase 12 members 

34. (AspartyttRNA N) 

^ Aminoacyl-transfer RNA synthetases class-ll signatures 

[0222] Aminoacyl-tRNA synthetases (EC 6. 1 . 1 .-) (1 J are a group of enzymes whfch activate amino ackis and transfer 
them to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least 
twenty different types of aminoacyl-tRNA synthetases, one for each different amino ackJ. In eukaryotes there are gen- 

45 erally two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form 
While all these enzymes have a common functk>n, they are wkiely diverse in terms of subunit size and of quaternary 
structure. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine. lysine, phenylalanine, 
proline, serine, and threonine are refenred to as class-tl synthetases [2 to 6] and probably have a common folding 
pattern in their catalytk: domain for the binding of ATP and amino ackJ whfch Is different to the Rossmann fold observed 

so for the class I synthetases [7]. Class-ll tRNA synthetases do not share a high degree of similarity, however at least 
three conserved regkxis are present [2,5,8J. Signature patterns have been derived from two of these regbns. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-IRH]-x(3)-F-x(3)-tDE] 

Consensus pattern: [GSTALVFl-{DENQHRKP}-[GSTA]-[LIVMn^DE]-R-[LIVMF>x-[LIVMSTAG]-[LIVMF^ 

ss [1] Schimmel R Annu. Rev Bkxhem. 56:125-158(1987). 

[ 2] Delarue M., Moras D. BbEssays 15:675-687(1993). 
( 3] Schimmel R Trends Bkx:hem. Scl. 16:1-3(1991). 

[ 4] Nagel G.M.. Doollttle RR Proc. Natl. Acad Sci. U.S.A. 88:8121-8125(1991). 



37 



EP1033 405 A2 



[ 5] Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991). 
1 6] Cusack S. Bfochimie 75:1077-1081(1993). 

1 7] Cusack S., Berthet-Colominas C, Haertlein M., Nassar N., Leberman R. Nature 347:249-255(1990). 
[ 8) Leveque R, Plateau P., Dessen P, Blanquet S. Nuciek: Acids Res. 18:305-312(1990). 

5 

35. (ArfGap) Putative GTP-ase activating protein for Art. Putative zinc fingers witfi GTPase activating proteins (GAPs) 
towards the small GTPase, Arf. The GAP of ARD1 stimulates GTPase hydrolysis for ARD1 but not ARFs. Number of 
members: 34 

10 [0223] 

[1]Medline: 96324970. Identificatnn and ctoning of centaurin-aipha. A novel phosphatldylinositol 3,4,5-trlsphos- 
phate-binding protein from rat brain. Hammonds-Odie LP, Jackson TR, Profit AA, Blader IJ, Turck CW. Prestwk:h 
GD, Theibert AB; J Biol Cham 1996;271:18859-18868. 
IS [2]Medline: 97296423. A target of phosphatklylinositoi 3,4.5-trisphosphate with a zinc finger motif similar to that 

of the ADP-ribosylatkMi -factor GTPase-activating protein and two pleckstrin homology domains. Tanaka K. Imajoh- 
Ohml S, Sawada T, Shirai R, Hashimoto Y, Iwasaki S, KaibuchI K. Kanaho Y, Shirai T Terada Y, Kimura K. Nagata 
S, Fukui Y; Eur J Biochem 1997;245:512-519. 

[3] 981 1 2795. Molecular characterization of the GTPase-activating domain of ADP-ribosylatran factor domain pro- 
20 tein 1 (ARD1). Vilale N. Moss J, Nfeughan M; J Biol Chem 1998;273:2553-2560. 

36. Apolipoprotein. Apolipoprotein A1/A4/E family. This family includes: Swiss:P02647 Apolipoprotein A-l. Swiss: 
P06727 Apolipoprotein A-l V. Swlss:P02649 Apolipoprotein E. These proteins contain several 22 reskiue repeats which 
form a pair of alpha helk:es. Number of members: 42 

[0224] [1]Medline: 91289138. Three-dimensk)nal structure of the LDL receptor-binding domain of human apolipo- 
protein E. Wilson C, V^rdell MR, Weisgraber KH. Mahtey FW. Agard DA; Science 1991;252:1817-1822. 

37. Amino ackJ permeases signature 

30 

[0225] Amino ackl permeases are integral membrane proteins involved In the transport of amino acids into the cell. 
A number of such proteins have been found to be evolutionary related [1,2,3). These proteins are: - Yeast general 
amino ackl permeases (genes GAP1, AGP2 and AGP3). - Yeast bask: amino acid permease (gene ALP1). - Yeast 
Leu/Val/lle permease (gene BAP2). - Yeast arginine permease (gene CAN1 ). - Yeast dlcartxDxylc amino ackJ permease 

3S (gene DIPS). - Yeast asparagine/glutamine permease (gene AGP1 ). - Yeast glutamrne pemnease (gene GNP1 ). - Yeast 
histkJine permease (gene HIP1 ). - Yeast lysine permease (gene LYP1 ). - Yeast proline permease (gene PUT4). - Yeast 
valine and tyrosine permease (gene VAL1/TAT1). - Yeast tryptophan permease (gene TAT2/SCM2). - Yeast choline 
transport protein (gene HNM1/CTR1). - Yeast GABA permease (gene UGA4). - Yeast hypothetical protein YKL174c. 
- FIsskxi yeast protein isp5. - Fissbn yeast hypothetical protein SpAC8A4.11 - Fissk)n yeast hypothetical protein 

40 SpAC1 1 D3.08C. - Emericella nkiulans proline transport protein (gene pmB). - Trfchodenma harzianum amino acid per- 
mease INDA1 - Salmonella typhimurium L-asparagine permease (gene ansP). - Escherbhia coli aromatic amino acid 
transport protein (gene aroP). - Escherchia coli D-serine/D-alanine/glycine transporter (gene cycA). - Escherbhia coli 
GABA permease (gene gabP). - Escherichia coli lysine-specific permease (gene lysP). - Escherichia coli phenylalanine- 
specifk: permease (gene pheP). - Salmonella typhimurium proline-specific permease (gene proY). - Escherbhia coli 

45 and Klebsiella pneumoniae hypothetk»l protein yeeF - Escherrchia coli and Salmonella typhimurium hypothetical pro- 
tein yifK. - Bacillus subtilis permeases rocC and rocE whk:h probably transports arginine or ornithine. These proteins 
seem to contain up to 12 transmembrane segments. As a signature for this family of proteins, the best consented 
region whch is kxated in the second transmembrane segment has been selected. 

Consensus pattern: [STAGCl-G-[PAGhx(2.3)-[LI\^FYWA](2)-x-[LIVMPm]-x-[LIVMFWSTAGCJ(2)-(STAGC]-x(3)- 
so [UVMFYWTJ-x-lLIVMST]-x(3)- [LIVMCTAI-[GA}-E-x(5)-[PSAL]- 

[ 1] Weber E., Chevalier M.R.. Jund R. J. Mol. Evol. 27:341-350(1988). 
[ 2] NAandenbol M., Jauniaux J.-C, Grenson M. Gene 83:153-159(1989). 

[ 3] Reizer J.. FInley K., Kakuda D., McLeod C.L. Relzer A.. Saier M.H. Jr. Protein Sci. 2:20-30(1993). 

55 

38. aakinase (1) Gtutamate 5-kinase signature 

[0226] Glutamate 5-kinase (EC 2.7.2.11 ) (gamma-glutamyl kinase) (GK) is the enzyme that catalyzes the first step 



38 



EP 1 033 405 A2 



in the biosynthesis of proline from glutamate, the ATP-dependent phosphorylation of L-glutamate into L-glutamate 
5-phosphate. In eubacteria (gene proB) and yeast [1] (gene PR01). GK is a monofunctlonal protein, while in plants 
and mammals, it is a bifunctional enzyme (P5CS) (2Jlhat consists of two domains: a N-terminal GK domain and a C- 
tenninal gamma-glutamyl phosphate reductase domain (EC 1.2.1.41) (see <P[X)C00940>).As a signature pattern, a 
5 highly consented gtycine-and alanine-rfch region located in the central section of these enzymes has been selected. 
Yeast hypothetical protein YHROSSw is highly similar to GK. 

Consensus pattern: [GSTNJ-x(2)-G-x^^GCHIM]-x4STAJ-K-[LIVM]-x4SA]-[TCAJ-x(2HGALV]-x(3)^ 

[ 1] Li W.. Brandriss M.C. J. Bacteriol. 174:4148-4156(1992). 
10 1 2] Hu C.-A.A., Oelauney A.J., Verma D.P.S. Proc. Natl. Acad. Sci. U.S.A. 89:9354-9358(1992). 

aakinase (2) Aspartokinase signature 

[0227] Aspartokinase (EC 2.7.2.4 ) (AK) [1 ] catalyzes the phosphorylation of aspartate. The product of this reaction 
IS can then be used in the btosynthesis of lysine or in the pathway leading to homoserine, which participates in the 
biosynthesis of threonine, isoleucine and methionine. In Escherk:hia coli. there are three different isozymes which differ 
in their sensitivity to represswn and inhibitkxi by Lys. Met and Thr. AK 1 (gene thrA) and AK2 (gene metL) are bifunctkxial 
enzymes which both consist of an N- terminal AK domain and a C-terminal homoserine dehydrogenase domain. AK1 
is involved in threonine bk)synthesis and AK2, in that of methionine. The third isozyme, AK3 (gene lysC), is monofunc- 
20 tional and involved in lysine synthesis. In yeast, there is a single isozyme of AK (gene HOM3). As a signature pattern 
for AK, a consen/ed region located in the N-terminal extremity has been selected. 
Consensus pattern: [LIVM]-x-K-[FY]-G-G-ISTHSCHU VM]- 
[ 1] RafalskI J.A., FaIco S.C. J. Bnl. Chem. 263:2146-2151(1988). 

^ aakinase (3) Gamma-glutamyl phosphate reductase signature 

[0228] Gamma-glutamyl phosphate reductase (EC 1.2.1.41) (GPR) is the enzyme that catalyzes the second step in 
the bk>synthesls of proline from glutamate, the NADP-dependent reduction of L-glutamate 5-phosphate into L-gluta- 
mate 5-semlaldehyde and phosphate. In eubacteria (gene proA) and yeast (1] (gene PR02), GPR is a monofunctional 
30 protein, while in plants and mammals, it is a bifunctional enzyme (PSCS) [2Jthat consists of two domains: a N-temiinal 
glutamate 5-kinase domain(EC 2^2^!) (see <PDOC00701>) and a C-terminal GPR domain. As a signature pattem. 
a conserved regkxi that contains two histidlne residues has been selected. This regbn is kx:ated in the last third of GPR. 
Consensus pattem: V-x(5)-A-[LIV]-x-H-l-x(2)-[HY]-(GSJ-[ST]-x-H-[ST]-[DE]-x- 1- 

35 [ 1] Pearson B.M., Hernando Y, Payne J., Wolf S.S., Kalogeropoulos A, Schweizer M. Yeast 12:1021-1031(1996). 

[ 2] Hu C.-A.A., Delauney A.J.. Verma D PS. Proc. Natl. Acad. Scl. U.S.A. 89:9354-9358(1992). 

39. (abhydrolase) alpha/beta hydrolase fokJ. This catalytk; domain is found in a very wkte range of enzymes. 

40 [0229] [1] OIlis DL, Cheah E, Cygler M, Dijkstra B, Frotow R Franken SM, Harel M, Remington SJ. Silman I, Schrag 
J. Sussman JL, Verschueren KHG, GoWman A, Protein Eng 1992;5:197-211. 

40. (AckJ phosphat) Histkiine acid phosphatases signatures 

45 [0230] AckJ phosphatases (EC 3.1.3.2) are a heterogeneous group of proteins that hydrolyze phosphate esters, 
optimally at tow pH. It has been shown [1] that a number of acid phosphatases, from both prokaryotes and eukaryotes, 
share two regions of sequence similarity, each centered around a consented histidine resklue. These two histklines 
seem to be involved in the enzymes' catalytk; mechanism [2.3]. The first histidine is tocated in the N-terminal seclton 
and forms a phosphohistidine intermediate while the second Is tocated in the C- terminal section and possibly acts as 

so proton donor. Enzymes betonging to this family are called 'histkiine acid phosphatases' and are listed below: 

Escherichia coli pH 2.5 acid phosphatase (gene appA). 

- Escherk:hia colt glucose-1 -phosphatase (EC 3. 1 .3. 1 0) (gene agp). 

- Yeast constitutive and represslble acid phosphatases (genes PH03 and PH05). 
ss . Fisston yeast acid phosphatase (gene phol ). 

- Aspergillus phytases A and B (EC 3. 1 .3.8) (gene phyA and phyB). 
Mammalian lysosomal acid phosphatase. 

Mammalian prostatto acid phosphatase. 
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- Caenorhabditis etegans hypothetical proteins B0361 .7. C05C10. 1 , C05C1 0.4 and F26C11 . 1 . 

[0231] Consensus pattemILIVMhx(2)^UVMA]-x(2HUVM]-x-R-H-(GN]-x-R-xHPAS] [H is the phosphohistidine resi- 
due] 

5 Consensus pattem[LIVMn-x-(LI\^FAGhx(2)^STAGI]-H-D4STANQ]-x-(LIVMhx(2)HLIVMFY]-^^^^ [H Is an ac- 
tive site residue] Sequences known to belong to this class detected by the patternALL, except for rat prostatic acid 
phosphatase which seems to have Tyr instead of the active site His 

[ 1] van Etten R.L. Davidson R., Stevis RE., MacArthur H.. Moore D.L J. Biol. Chem. 266:2313-2319(1991). 
10 1 2] Ostanin K., Harms E.H., Stevts RE., Kuciel R., Zhou M.-M.. van Etten R.L. J. Biol. Chem. 267:22830-22836 

(1992). 

[ 3] Schneider G., Lindqvist Y, Vihko R EMBO J. 12:2609-2615(1993). 

41. Aconitase family signatures 

15 

[0232] Aconitase (aconitate hydratase) (EC 4.2.1.3) [1 ] is the enzyme from the tricarboxylic acid cycle that catalyzes 
the reversible isomerization of citrate and isocitrate. Cis-aconitate is formed as an intermediary product during the 
course of the reaction. In eukaryotes two isozymes of aconitase are known to exist: one found In the mitochondrial 
matrix and the other found in the cytoplasm. Aconitase, in its active form, contains a 4Fe-4S iron-sulfur cluster; three 

20 cysteine residues have been shown to be ligands of the 4Fe-4S cluster. It has been shown that the aconitase family 
also contains thefolbwing proteins: - Iron-responsive element binding protein (IRE-BP). IRE-BP is a cytosolic protein 
that binds to iron-responsive elements (IREs). IREs are stem-loop structures found In the 5'UTR of ferritin, and defta 
aminolevulinic acid synthase mRNAs, and in the S'UTR of transferrin receptor mRNA. IRE-BP also express aconitase 
activity. - 3-lsopropylmalate dehydratase (EC 4.2.1.33) (isopropylmalate isomerase), the enzyme that catalyzes the 

25 second step In the biosynthesis of leucine. - Homoaconitase (EC 4.2.1. 36 ) (homoaconitate hydratase). an enzyme that 
partk:ipates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 
moisocitric acid. - Esherichia coll protein ybhJ 

Consensus pattem: [UVM]-x(2)-(GSACIVMhx-[LTyi-[GTIV]-[STP]-C-x(0,1)-T-N-[GSTANIhx(4)-[LI^ [C binds the 
iron-sulfur center] 

30 Consensus pattem: G-x(2)-[LIVWPQJ-x(3)-[GACJ-C-[GSTAM]-[LIMPTA]-C-[LIMV]-[GA] [The two C's bind the iron-sul- 
fur center]- 

[ 1] Gmer M.J.. Artymiuk RJ., Guest J.R Trends B»chem. Sci. 22:3-6(1997). 

42. Actins signatures 

3S 

[0233] Actins [1 to 4] are highly consen^d contractile proteins that are present in all eukaryotic cells. In vertebrates 
there are three groups of actin isoforms: alpha, beta and gamma. The alpha actins are found in muscle tissues and 
are a major constituent of the contractile apparatus. The beta and gamma actins coexists in most cell types as com- 
ponents of the cytoskeleton and as mediators of internal cell motility. In plants [5] there are many isofomis which are 

40 probably involved In a variety of functions such as cytoplasmic streaming, cell shape determinatbn, tip growth, gravi- 
perception. cell wall deposition, etc. Actin exists either in a monomeric form (G-actin) or in a polymerized f omi (F-actin). 
Each actin monomer can bind a molecule of ATP; when polymerizatkxi occurs, the ATP is hydrolyzed. Actin is a protein 
of from 374 to 379 amino acid residues. The structure of actin has been highly conserved in the course of evolution. 
Recently some drvergent acttn-like proteins have been identified in several species. These proteins are: - Centractin 

45 (actin-RPV) from mammals, fungi (yeast ACTS. Neurospora crassa ro-4) and Pneumocystis carinii (actin-ll). Centractin 
seems to be a component of a multi-subunit centrosomal complex involved in microtubule based vesicle motility This 
subfamily is also known as ARP1. - ARP2 subfamily which includes chcken ACTL. yeast ACT2, Drosophila 14D. C. 
elegans actC. - ARP3 subfamily whk:h includes actin 2 from mammals, Drosophila 66B, yeast ACT4 and fission yeast 
act2. - ARP4 subfamily which includes yeast ACTS and Drosophila 1 3E. Three signature patterns have been devetoped. 

so The first two are specific to actins and span positions 54 to 64 and 357 to 365. The last signature pbks up both actins 
and the actin-like proteins and corresponds to positions 106 to 118 in actins. 
Consensus pattem: [FYh[UVpG-(DEhE-A^x-(RKQ](2)-G- 
Consensus pattem: W-[IVJ-[STA]-IRKJ-x-[DE]-Y-{DNEHDEJ- 

Consensus pattem: [LM]-IUVM]-T-E4GAPQhx-[UVMFYWHQ]-N-[PSTAQ]-x(2)-N-[KR]- 

55 

[ 1] Sheterline R, Clayton J.. Sparrow J.C. (In) Actins, 3rd Edition, Academic Press Ltd. London. (1996), 
[ 2] Pollard T.D.. Cooper J.A. Annu. Rev. Bkx:hem. 55:987-1036(1986). 
[ 3] Pollard TO. Curr. Opin. Cell Bid. 1:33-40(1990). 
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[ 4] Rubenstein RA. BioEssays 12:309-315(1990). 

[ 5] Meagher R.B., McLean B.G. Cell Motil. Cytoskeleton 16:164*166(1990). 

43. Adenylate kinase signature 

5 

[0234] Adenylate kinase (EC 2.7.4.3) (AK) [1 ] is a small monomeric enzyme that catalyzes the reversible transfer of 
MgATP to AMP (MgATP + AMP = MgADP + ADP).ln mammals there are three different isozymes: - AK1 (or myokinase). 
which is cytosolic. - AK2, which is located in the outer compartment of mitochondria. - AK3 (or GTP7VMP phospho- 
transferase), which is located in the mitochondrial matrix and which uses MgGTP instead of MgATPThe sequence of 

10 AK has also been obtained from different bacterial species and from plants and fungi. Two other enzymes have been 
found to be evolutonary related to AK. These are: - Yeast uridylate kinase (EC 2.7.4.-) (UK) (gene URA6) [2] which 
catalyzes the transfer of a phosphate group from ATP to UMP to form UDP and ADR - Slime moW UMP-CMP kinase 
(EC 2.7.4.14 ) [3] which catalyzes the transfer of a phosphate group from ATP to either CMP or UMP to form CDP or 
UDP and ADR Several regions of AK family enzymes are well consen/ed, including the ATP-binding domains. The 

IS most consen/ed of all regk>ns have been selected as a signature for this type of enzyme. This region includes an 
asparte acid reskiue that is part of the catalytk: cleft of the enzyme and that is involved in a salt brkJge. It also includes 
an arginine residue whose nrxxlrfk^tbn leads to lnactlvatk>n of the enzyme 
Consensus pattern: (U VMFYW](3)-D-G-lFYIhP-R-x(3)-[NO]- 

20 [1] Schuiz G.E. Cold Spring Harbor Symp. Quant. Biol. 52:429-439(1 987). 

[ 2] Liljelund R, Sanni A., Friesen J.D„ Lacroute R Bkxhem. Bbphys. Res. Commun. 165:464-473(1989). 
[ 3] Wiesmueller L, Noegel A.A.. Barzu O., Gerisch G., Schleicher M. J. Biol. Chem. 265:6339-6345(1990). 
[ 4] Kath T.H.. Schmid R.. Schaefer G. Arch. Biochem. Bbphys. 307:405-410(1993). 

2S [0235] 44. (adh_short) Short-chain dehydrogenases/reductases family signature. 

[0236] The short-chain dehydrogenases/reductases family (SDR) [1 ] is a very large family of enzymes, most of whfch 
are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized 
was Drosophila ateohol dehydrogenase, this family used to be called [2,3.4]'insect-type'. or 'short-chain' akx>hol dehy- 
drogenases. Most member of this family are proteins of about 250 to 300 amino acid residues. The proteins currently 

30 known to belong to this family are listed betow. - Alcohol dehydrogenase (EC 1.1.1.1 ) from insects such as Drosophila. 

- Acetoin dehydrogenase (EC 1.1.1.5) from Klebsiella terrigena (gene budC). - D-beta-hydroxybutyrate dehydrogenase 
(BDH) (EC 1.1.1.30) from mammals. - Acetoacetyl-CoA reductase (EC 1.1.1.36 ) from various bacterial species (gene 
phbB or phaB). - Glucose 1 -dehydrogenase (EC 1.1.1.47 ) from Bacillus. - 3-beta-hydroxysteroid dehydrogenase (EC 
11-151) from Comomonas testosteronl - 20-beta-hydroxysteroid dehydrogenase (EC 1.1.1.53 ) from Streptomyces 

35 hydrogenans. - Ribitol dehydrogenase (EC 1.1.1.56) (RDH) from Klebsiella aerogenes. - Estradbl 17-beta-dehydro- 
genase (EC 1.1.1.62) from human. - Gluconate 5-dehydrogenase (EC 1.1.1.69) from Gluconobacter oxydans (gene 
gno). - 3-oxoacyl-{acyl-carrier protein] reductase (EC 1.1.1.100) from Escherfchia coli (gene fabG) and from plants. - 
Retind dehydrogenase (EC 1.1.1.105) from mammals. - 2-deoxy-d-gluconate 3<Jehydrogenase (EC 1.1.1.125 ) from 
Escherichia coli and Enmnia chrysanthemi (gene kriuD). - Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140) from 

40 Escherichia coli (gene gutD) and from Klebsiella pneumoniae (gene soriD). - 1 5-hydroxyprostaglandin dehydrogenase 
(NAD+) (EC 1.1.1.141) from human. - CorticosterokJ 11-beta-<Jehydrogenase (EC 1.1.1.146) (11 -DH) from mammals. 

- 7-aIpha-hydroxysteroid dehydrogenase (EC 1.1.1.159) from Escherichia coli (gene hdhA). Eubacterium strain VPI 
12708 (gene baiA) and from Clostrklium sordellii. - NADPH-dependent carbonyl reductase (EC 1.1.1.184 ) from mam- 
mals. - Troplnone reductase-l (EC 1.1.1.206) and -II (EC 1.1.1.236 ) from plants. - N-acylmannosamine 1-dehydroge- 

45 nase (EC 1.1.1.233) from Flavobacterium strain 141-8. - D-arabinrtol 2-dehydrogenase (ribulose forming) (EC 
1.1.1.250) from fungi. - Tetrahydroxynaphthalene reductase (EC 1.1.1.252 ) from Magnaporthe grisea. - RerkJine re- 
ductase 1 (EC 1.1.1.253 ) (gene PTR1) from Leishmania. - 2,5-dichtoro-2,5-cyclohexadiene-1,4-dbl dehydrogenase 
(EC 1.1.-.-) from Pseudomonas paucimobills. - Cis-1,2-dihydroxy-3,4-cyck)hexadiene-1-carboxylate dehydrogenase 
(EC 1.3.1. -) from Acinetobacter cakx)aceticus (gene benD) and Pseudomonas putida (gene xylL). - Biphenyl-2,3-di- 

50 hydro-2,3<Jiol dehydrogenase (EC 1 .3.1 .-) (gene bphB) from various Pseudomonaceae. - Cis-toluene dihydrodk>l de- 
hydrogenase (EC 1.3.1.-)from Pseudomonas putida (gene todD). - Cis-benzene glycol dehydrogenase (EC 1.3.1.19) 
from Pseudomonas putkia (gene bnzE). - 2,3-dihydro-2.3-dihydroxybenzoate dehydrogenase (EC 1.3.1.28) from Es- 
cherichia coli (gene entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine reductase (EC 1.6.99.7) (HDHPR) from 
mammals. - Lignin degradatkxi enzyme ligD from Pseudomonas paucimobills. - Agropine synthesis reductase from 

56 Agrobacterium plasmkJs (gene masi). - Vbrsicolorin reductase from Aspergillus parasiticus (gene VER1). - Putative 
keto-acyl reductases from Streptomyces polyketkJe bk)synthesis operons. - A trif unctional hydratase-dehydrogenase- 
epimerase from the peroxisomal beta-oxkiation system of Candkta tropicalis. This protein contains two tandemty re- 
peated 'short-chain dehydrogenase-type' domain in its N-lerminal extremity. - NkKJulatkxi protein nodG from species 
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of Azospirillum and Rhizobium which Is probably involved in the modification of the nodutatlon Nod factor fatty acyl 
chain. - Nitrogen fixation protein fixR from Bradyrhizobtum japonlcum. - Bacillus subtitis protein dttE which is Involved 
in the biosynthesis of D- abnyl-lipoteichoic acid. - Human follicular variant translocation protein 1 (FVT1). - Mouse 
adipocyte protein p27. - Mouse protein Ke 6. - Maize sex determination protein TASSELSEED 2. - Sarcophaga pere- 
grina 25 Kd development specific protein. - Drosophila fat body protein P6. - A Listeria monocytogenes hypothetical 
protein encoded in the intemalins gene region. - Escherichia cdi hypothetical protein yciK. - Escherichia coli hypothet- 
ical protein ydfG. - Escherichia coli hypothetical protein yjgl, - Escherk:hia coli hypothetical protein yjgU. - Escherichia 
coli hypothetical protein yohF. - Bacillus subtllis hypothetical protein yoxD. - Bacillus subtilis hypothetical protein ywfD. 

- Bacillus subtilis hypothetical protein ywf H. - Yeast hypothetical protein YIL1 24w. - Yeast hypothetical protein YIROSSc. 

- Yeast hypothetical protein YIR)36c. - Yeast hypothetical protein YKL055c. - Fission yeast hypothetical protein 
SpAC23D3.11. One of the best conserved regions which includes two perfectly conserved residues, a tyrosine and a 
lysine has been selected as a signature pattern for this family of proteins. The tyrosine residue participates in the 
catalytic mechanism. 

Consensus pattem: [LIVSPADNK)-x(12)-Y-IPSTAGNCVHSTAGNQCIVMHSTAGC]-K- {PC}-[SAGFYR]-[UVM- 
STAGD]-x(2)-(LI VMFYWl-x(3)- [LIVMFYWGAPTHQHGSACXIRHM] [Y Is an active site residue] - 

[ 1] Joemvall H., Persson B., Krook M., Atrian S., Gonzalez-Duarte R., Jeffery J., Ghosh D. Biochemistry 34* 
6003-6013(1995). 

[ 2] Villarroya A., Juan E., Egestad B.. Joernvall H. Eur. J. Biochem. 180:191-197(1989). 
[ 3] Persson B., Krook M., Joemvall H. Eur. J. Bkxjhem. 200:537-543(1991). 

[4] Neidle E.L, Hartnett C, Ornston N.L. Bairoch A., Rekik M.. Harayama S. Eur. J. Bkx:hem. 204:113-120(1992). 
[0237] 45. (adh_short_C2) Short-chain dehydrogenases/reductases family signature 

The short-chain dehydrogenases/reductases family (SDR) [1] is a very large family of enzymes, most of which are 
known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized was 
Drosophila ateohol dehydrogenase, this family used to be called [2,3.4)'insect-type', or 'short-chain' ateohol dehydro- 
genases. Most member of this family are proteins of about 250 to 300 amino ackJ residues. The proteins currently 
known to belong to this family are listed betow. - Ateohol dehydrogenase (EC 1.1.1.1 ) from insects such as Drosophila. 

- Acetoin dehydrog^se (EC 1.1.1.5 ) from Klebsiella terrigena (gene budC). - D-beta-hydroxybutyrate dehydrogenase 
(BDH) (EC 1.1.1.30) from mammals. - Acetoacetyl-CoA reductase (EC 1.1.1.36 ) from various bacterial species (gene 
phbB or phaB). - Glucose 1 -dehydrogenase (EC 1.1.1.47) from Bacillus. - 3-beta-hydroxysteroid dehydrogenase (EC 
11 1.51) from Comomonas testosteroni. - 20-beta-hydroxysteroid dehydrogenase (EC 1.1.1.53 ) from Streptomyces 
hydrogenans. - RIbitol dehydrogenase (EC 1.1.1.56) (RDH) from Klebsiella aerogenes. - Estradral 17-beta-dehydro- 
genase (EC 1.1.1.62) from human. - Gluconate 5-dehydrogenase (EC 1.1.1.69 ) from Gluconobacter oxydans (gene 
gno). - aoxoacyKacyl-carrier protein] reductase (EC 1.1.1.100) from Escherichia coli (gene fabG) and from plants. - 
Retlnol dehydrogenase (EC 1.1.1.105) from mammals. - 2-deoxy-d-gluconate 3<tehydrogenase (EC 1.1.1.125 ) from 
Escherichia coli and Enwnia chrysanthemi (gene kduD). - Sorbitol-6-phosphate 2-dehydrogenase (EC 1.1.1.140 ) from 
Escherichia coll (gene gutD) and from Klebsiella pneumoniae (gene sorD). - 15-hydroxyprostaglandln dehydrogenase 
(NAD+) (EC 1.1.1.141) from human. - Cortkx>sterokJ 11-beta-dehydrogenase (EC 1.1.1.146 ) (11-DH) from mammals. 

- 7nalpha-hydroxysterokl dehydrogenase (EC 1.1.1.159) from Escherichia coli (gene hdhA), Eubacterium strain VPI 
12708 (gene baiA) and from Clostridium sordellli. - NADPHntependent carbonyl reductase (EC 1.1.1.184) from mam- 
mals. - Tropinone reductasc-l (EC 1.1.1.206) and -11 (EC 1.1.1.236 ) from plants. - N-acylmannosamlne 1-dehydroge- 
nase (EC 1.1.1.233) from Flavobacterium strain 141^. - D-arabinitol 2-dehydrogenase (ribulose forming) (EC 
1.1.1.250) from fungi. - Tetrahydroxynaphthalene reductase (EC 1.1.1.252 ) from Magnaporthe grisea. - RerkJine re- 
ductase 1 (EC 1.1.1.253 ) (gene PTR1) from Leishmania. - 2,5-dichk>ro-2.5-cyclohexadlene-1,4-dk)l dehydrogenase 
(EC 1.1.-.-) from Pseudomonas paucimobilis. - Cls-1,2-dlhydroxy-3,4-cyclohexadiene-1-carboxylate dehydrogenase 
(EC 1.3.1. -) from Aclnetobacter cakxaceticus (gene benD) and Pseudomonas putkJa (gene xylL). - Biphenyl-2,3-dl- 
hydro-2,3<Jiol dehydrogenase (EC 1 .3.1 .-) (gene bphB) from various Pseudomonaceae. - Cis-toluene dihydrodiol de- 
hydrogenase (EC 1.3.1.-) from Pseudomonas putida (gene todD). - Cis-benzene glycol dehydrogenase (EC 1.3.1.19) 
from Pseudomonas putkia (gene bnzE). - 2,3-dihydro-2,3-dihydroxybenzoate dehydrogenase (EC 1.3.1.28) from Es- 
cherichia coli (gene entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine reductase (EC 1.6.99.7) (HDHPR) from 
mammals. - Lignin degradation enzyme ligD from Pseudomonas paucimobilis. - Agropkie synthesis reductase from 
Agrobacterium plasmkis (gene masi). - Versicokmn reductase from Aspergillus parasitbus (gene VER1). - Putative 
keto-acyl reductases from Streptomyces polyketkle bbsynthesis operons. - A trif unctbnal hydratase-dehydrogenase- 
epimerase from the peroxisomal beta-oxidation system of Candkla tropicalls. This protein contains two tandemly re- 
peated 'short-chain dehydrogenase-type' domain in its N-termlnal extremity. - Nodulatbn protein nodG from species 
of Azospirillum and Rhizobium whteh is probably involved in the modificatbn of the nodulatk>n Nod factor fatty acyl 
chain. - Nitrogen fixatk)n protein fixR from Bradyrhizobium japonicum. - Bacillus subtilis protein dItE which Is involved 
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in the biosynthesis of D- abnyl-lipoteichoic acid. - Human follicular variant translocation protein 1 (FVT1). - Mouse 
adipocyte protein p27. - Mouse protein Ke 6. - (Maize sex determination protein TASSELSEED 2, - Sarcophaga pere- 
grina 25 Kd development specific protein. - Drosophila fat body protein P6. - A Listeria monocytogenes hypothetical 
protein encoded in the intemalins gene region. - Escherichia coli hypothetical protein yciK. - Escherichia coll hypothet- 
ical protein ydfG. - Escherichia coli hypothetical protein yjgl. - Escherichia coli hypothetical protein yjgU. - Escherichia 
coli hypothetical protein yohF. - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis hypothetical protein ywfD. 

- Bacillus subtilis hypothetical protein ywfH. - Yeast hypothetical protein YIL124w. - Yeast hypothetical protein YIR035C. 

- Yeast hypothetical protein YIR036c. - Yeast hypothetical protein YKL055c. - Fission yeast hypothetical protein 
SpAC23D3.11. One of the best conserved regions which includes two perfectly consented residues, a tyrosine and a 
lysine has been used as a signature pattern for this family of proteins. The tyrosine residue participates in the catalytic 
mechanism. 

Consensus pattem: [UVSPADNK]-x(12)-Y^PSTAGNCVHSTAGNQCIVMHSTAGC)- K- {PCHSAGFYRJ-[LIVM- 
STAGDhx(2HUVMFYVVl-x(3)- [LIVf^FYWGAPTHQ}- [GSACQRHM] [Y is an active site residue] 

[ 1] Joernvall H., Persson B.. Krook M., Atrian S., Gonzalez-Duarte R., Jeffery J.. Ghosh D. Biochemistry 34: 
6003^13(1995). 

[ 2] Villarroya A., Juan E., Egestad B., Joernvall H. Eur. J. Biochem. 180:191-197(1989). 
[ 3) Persson B., Krook M.. Joernvall H. Eur. J. BkKhem. 200:537-543(1991). 

[ 4] Nekile E.L, Hartnett C, Omston N.L, Bairoch A., Rekik M., Harayama S. Eur J. Biochem. 204:113-120(1992). 

[0238] 46. (adh_zinc) Zinc-containing alcohol dehydrogenases signatures Alcohol dehydrogenase (EC 1.1. 1.1^ 
(ADH)catalyzesthereversibleoxldationof ethanoltoacetaktehyde withtheconcomi^^ reduction of NAD [1]. Currently 
three, stmcturally and catalytically, different types of alcohol dehydrogenases are known: - Zinc-containing 'long-chain* 
alcohol dehydrogenases. - Insect-type, or 'short-chain' alcohol dehydrogenases. - Iron-containing alcohol dehydroge- 
nases.Zinc-containlng ADH's [2,3] are dimeric or tetrameric enzymes that bind two atoms of zinc per subunlt. One of 
the zinc atom is essential for catalytk: activity while the other is not. Both zinc atoms are coordinated by either cysteine 
or histkiine resklues; the catalytic zinc is coordinated by two cysteines and one histidine. Zinc-containing ADH's are 
found in bacteria, mammals, plants, and in fungi. In most species there are nrxjre than one isozyme (for example, 
human have at least six isozymes, yeast have three, etc.). A number of other zinc-dependent dehydrogenases are 
closely related to zinc ADH [4], these are: - Xylitol dehydrogenase (EC 1.1.1.9 ) (D-xylutose reductase), - Sorbitol de- 
hydrogenase (EC 1.1.1.14) . - Aryl-ateohol dehydrogenase (EC 1.1.1.90 ) (benzyl ateohol dehydrogenase). - Threonine 
3-dehydrogenase (EC 1.1.1.103) . - Cinnamyl-alcohol dehydrogenase (EC 1.1.1.195 ) (CAD) [5]. CAD is a plant enzyme 
involved in the bk>synthesis of lignin. - Galactitol-1 -phosphate dehydrogenase (EC 1.1.1.251 ). - Pseudomonas putkJa 
5-exo-akx)hol dehydrogenase (EC 1.1.1.-) [6]. - Escherfchia coli stan^ation sensing prc^ein rspB. - Escherichia coli 
hypothetcal protein yjgB. - Escherichia coli hypothetical protein yjgV. - Escherichia coli hypothetical protein yjjN. - Yeast 
hypothetkal protein YAL060w (FUrM49). - Yeast hypothetical protein YAL061 w (FUN50). - Yeast hypothetk:al protein 
YCR105W. The pattern that has been devek)ped to detect this class of enzymes is based on a consented regbn that 
includes a histkiine reskiue whfch is the second ligand of the catalytk: zinc atom. This family also Includes NADP- 
dependent quinone oxktoreductase (EC 1.6.5.5) .an enzyme found in bacteria (gene qor), in yeast and In mammals 
where, in some species such as rodents, It has been recruited as an eye lens protein and Is known as zeta-crystallin 
[7]. The sequence of quinone oxktoreductase is distantly related to that other zinc-containing alcohol dehvdrogenases 
and it lacks the zinc-ligand resklues. The torpedo fish and mammlian synaptk: vesk:le membrane protein vat-1 is related 
to qor. A specific pattem has been developed for this subfamily. 
Consensus pattern: G-H-E-x(2)-G-x(5)-[GA]-x(2)-II VSAC] [H is a zinc ligand] 
Consensus pattern: [GSDHDEQH}-x(2)-L-x(3HSA](2)-G-G-x-G-x(4)-Q-x(2)-[KRl- 

[ 1] Branden C.-l., Joernvall H., Ekiund H., Furugren B. (In) The Enzymes (3rd editbn) 11 : 104-1 90(1 975). 
[ 2] Joernvall H., Persson B., Jeffery J. Eur. J. Bkx:hem. 167:195-201(1987). 
[ 3] Sun H.-W., Plapp B.V. J. Mol. Evol. 34:522-535(1992). 

[ 4] Persson B., Hallbom J.. Walfridsson M.. Hahn-Haegerdal B., Keraenen S., Penttilae M., Joernvall H. FEBS 
Lett. 324:9-14(1993). 

( 5] Knight f^.E., Halpin C, Schuch W. Plant Mol. Bbl. 19:793^1(1992). 

[ 6] Koga H., Aramaki H.. Yamaguchi E., Takeuchi K., Horiuchi T, Gunsalus I.C. J. Bacterbl. 166:1089-1095(1986). 
[ 7] Joemvall H., Persson B.. Du Bois G., Lavers G.C.. Chen J.H., Gonzalez R. Rao RV., Zigler J.S. Jr. FEBS Lett. 
322:240-244(1993). 

[0239] 47. (akjedh) Akiehyde dehydrogenases active sites 

[0240] Aldehyde dehydrogenases (EC 1.2.1.3 and EC 1.2.1.5) are enzymes whfch oxidize a wkJe variety of aliphatic 
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and aromatic aldehydes. In mammals at least four different forms of the enzyme are known [1]: class-1 (or Aid C) a 
tetrameric cytosolic enzyme, class-2 (or Aid M) a tetrameric mitochondrial enzyme, class-3 (or Aid D) a dimeric cytosolic 
enzyme, and class IV a microsomal enzyme. Aldehyde dehydrogenases have also been sequenced from fungal and 
bacterial species, A number of enzymes are known to be evolutionary related to aldehyde dehydrogenases; these 
enzymes are listed betow. - Plants and bacterial betaine-aldehyde dehydrogenase (EC 1.2.1.8 ) [2], an enzyme that 
catalyzes the last step in the biosynthesis of betaine.- Plants and bacterial NADP<iependent glyceraldehyde-3-phos- 
phate dehydrogenase (EC 1.2.1.9) . - Escherichia coli succinate-semialdehyde dehydrogenase (NADP+) (EC 1.2. 1.16) 
(gene gabD) [3], which reduces succinate semialdehyde into succinate. - Escherichia coli lactaldehyde dehydrogenase 
(EC 1.2.1.22 ) (gene aid) [4]. - Mammalian succinate semialdehyde dehydrogenase (N AD+) (EC 1.2. 1.24 ). - Escherichia 
coli phenylacetakJehyde dehydrogenase (EC 1.2.1.39 ). - Escherichia coli 5-carboxymethyl-2-hydroxymuconate sem- 
ialdehyde dehydrogenase (gene hpcC). - Pseudomonas putida 2-hydr<»(ymuconic semialdehyde dehydrogenase [51 
(genes dmpC and xylG). an enzyme in the meta-cleavage pathway for the degradation of phenols, cresols and catechol. 

- Bacterial and mammalian methylmalonate-semialdehyde dehydrogenase (MMSDH) (EC 1.2.1.27 ) [6], an enzyme 
involved in the distal pathway of valine catabolism. - Yeast delta-1-pyrroline-5-cart)Oxylate dehydrogenase (EC 
1-5.1.12) [7] (gene PUT2), which converts proline to glutanr^ate. - Bacteria! multifunctional putA protein, which contains 
a delta-1-pyrroline- S^rboxylate dehydrogenase domain. - 26G, a garden pea protein of unknown function which is 
induced by dehydratksn of shoots [8]. - Mammalian fonmyltetrahydrofolate dehydrogenase (EC 1.5.1.6 ) [9]. This is a 
cytosolic enzyme responsible for the NADP-dependent decarboxylative reductbn of 10-fonmyltetrahydrofolate intotet- 
rahydrofolate. It is an protein of about 900 amino adds which consist of three domains; the C- terminal domain (480 
residues) is structurally and f unctkxially related to aldehyde dehydrogenases. - Yeast hypothetical protein YBROOSw. 

- Yeast hypothetical protein YER073w. - Yeast hypothetical protein YHR039c. - Caenorhabditis elegans hypothetkal 
protein F01F1.6.A glutamfc ackJ and a cysteine reskiue have been lmplk:ated in the catalytic activity of mammalian 
aldehyde dehydrogenase. These residues are conserved in all the enzynries of this family. Two patterns have been 
derived for this family, one for each of the active site residues. 

Consensus pattern: [LIVMFGA]-E-[LIMSTACHGS]-G-{KNLM]-{SADN]-ITAPFV] [E is the active site reskJue]- 
Consensus pattern: [FYLVAl-x(3)-G-[QEhx-C-{LI VMGSTANC]-[AGCN]-x4GSTADNEKR] [C is the active site residue 

( 1] Hempel J.. Harper K.. Lindahl R. Biochemistry 28:1160-1167(1989). 

[ 2] Weretrlnyk E.A., Hanson A.D. Proc. Natl. Acad. Sci. U.S.A. 87:2745-2749(1990). 

[ 3] Niegemann E., Schuiz A., Bartsch K. Arch. Microbral. 160:454-460(1993). 

[ 4] Hidalgo E., Chen Y-M.. Un E.C.C., Aguilar J. J. Bacterk>l. 173:6118-6123(1991). 

[ 5] Nordlund I., Shingler V. Bkx:him. Bwphys. Acta 1049:227-230(1990). 

[ 6] Steele M.I., Lorenz D.. Hatter K., Park A., Sokatch J.R. J. Bbl. Chem. 267:13585-13592(1992). 
[ 7] Krzywrcki K.A., Brandriss M.C. Mol. Cell. Biol. 4:2837-2842(1984). 
[ 8] Guerrero F.D.. Jones J.T., Mullet J.E. Plant Mol. Bbl. 15:11-26(1990). 
[ 9] Cook R.J.. Ltoyd R.S.. VAtegner C. J. Bbl. Chem. 266:4965-4973(1991). 

[0241] 48. Akio/keto reductase family signatures 

The aWo-keto reductase family [1 ,2] groups together a number of structurally and f unctfonally related NADPH-depend- 
ent oxkioreductases as well as some other proteins. The proteins known to betong to this family are: - Aldehyde re- 
ductase (EC 1.1.1.2 ). - Aktose reductase (EC 1.1.1.21) . - 3-alpha-hydroxysteroid dehydrogenase (EC 1.1.1.50) . whch 
terminates androgen actkxi by converting 5-alpha-dihydrotestosterone to 3-alpha-andiDStanedk)l. - Prostaglandin F 
synthase (EC 1.1.1.188) whch catalyzes the reductwn of prostaglandins H2 and D2 to F2-alpha. - D-sorbitol-6-phos- 
phate dehydrogenase (EC 1.1.1.200 ) from apple. - Morphine 6-dehydrogenase (EC 1.1.1.218) from Pseudomonas 
putida plasmid pMDH7.2 (gene morA). - Chlordecone reductase (EC 1.1.1.225) which reduces the pestrcde chk>- 
rdecone (kepone) to the coresponding ateohol. - 2,5-diketo-D-gluconk: acid reductase (EC 1.1.1.-) which catalyzes 
the reductkxi of 2,5<Jiketogluconic ackJ to 2-keto-L-gulonk: acW, a key intennediate in the productbn of ascorbic acki. 

- NAD(P)H-dependent xylose reductase (EC 1.1.1.-) from the yeast Pichia stipitis. This enzyme reduces xylose into 
xylit. - Trans-1 .2-dihydrobenzene-1 .2-dk)l dehydrogenase (EC 1.3.1.20 ). - 3-oxo-5-beta-steroid 4-dehydrDgenase (EC 

which catalyzes the reduction of derta(4)-3-oxosterokJs. - A soybean reductase, whbh ccnacts with chakx>ne 
synthase in the fonmaton of 4.2*,4Mrihydroxychalcone. - Frog eye lens rho crystallin. - Yeast GCY protein, whose 
functk)n is not known. - Lcishmania major P110/11E protein. P11Q/11 E is a developmentally regulated protein whose 
abundance is markedly elevated in promastigotes compared with amastigotes. Its exact function is not yet known. - 
Escherichia coli hypothetcal protein yafB. - Escherichia coll hypothetkal protein yghE. - Yeast hypothetrcal protein 
YBR149W. - Yeast hypothetical protein YHR104w. - Yeast hypothetical protein YJR096w. These proteins have all about 
300 amino acid reskiues. Three consensus patterns have been developed that are specific to this family of proteins. 
The first one is kx:ated in the N-terminal sectkxi of these proteins. The second pattern is located in the central section. 
The third pattern, located in the C-terminal. is centered on a lysine residue whose chemical modification, in aldose and 
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aldehydereductases, affect the catalytic efficiency. 

Consensus pattern: G4FYH14HSALHLIVMFh[HSTAGCHAS]-x(5)-E-x(2HLIVM]- G- 
Consensus pattern: [LIVMFYl-x(9HKREQ]-x-ILIVM}-G-[LIVMHSC)-N-[FY]- 

Consensus pattern: [LIVMHPAiyHKRHST|-x(4)-R-x{2)-[GSTAEQKHNSL]-x(2HLI VMFA] [K is a putative active site 
s residue]- 

[ 1] Bohren K.M.. Bullock B., Wermuth B.. Gabbay K.H. J. Bid. Chem. 264:9547-9551(1989). 
[ 2] Bruce N.C., Willey D.L. Coubon A.F.W.. Jefffery J. Biochem. J. 299:805-811(1994). 

10 [0242] 49. Alpha amylase. This family is classified as family 13 of the glycosyl hydrolases. The structure is an 8 
stranded alpha/beta barrel, interrupted by a "70 a.a. calcium-binding domain protruding between beta strand 3 and 
alpha helix 3, and a cart)oxyl-termlnal Greek key beta-barrel domain. 

[1] Larson SB, Greenwood A. Cascb D, Day J, McPherson A, J Mol Bbl 1994;235:1560-1584. 
[0243] 50. Aminotransferases class-l pyridoxal-phosphate attachment site 

IS Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyrktoxal- phosphate group to a lysine reskiue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 .2] into subfamilies. One of these, called class-l, cun^ently consists of the following 
enzymes: - Aspartate aminotransferase (AAT) (EC 2.6.1.1 ). AAT catalyzes the reversible transfer of the amino group 
from L-aspartate to 2-oxoglutarate to fomn oxafoacetate and L-glutamate. In eukaryotes, there are two AAT isozymes: 

20 one is kx^ated in the mitochondrial matrix, the second is cytoplasmic. In prokaryotes, only one form of AAT is found 
(gene aspC). - Tyrosine aminotransferase (EG 2.6.1.5) whk:h catalyzes the first step in tyrosine catabolism by reversibly 
transferring its amino group to 2- oxoglutarate to form 4-hydroxyphenylpyruvate and L-glutamate. - Aromatic ami- 
notransferase (EC 2.6.1.57 ) involved in the synthesis of Phe, Tyr, Asp and Leu (gene tyrB). - 1 -aminocyclopropane- 
1-carboxylate synthase (EC 4.4.1.14 ) (ACC synthase) from plants. ACC synthase catalyzes the first step in ethylene 

2S biosynthesis. - Pseudonrwnas denitrificans cobC, which is involved in cobalamin biosynthesis. - Yeast hypothetical 
protein YJL060w.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sulffciently 
consen/ed to albw the creatkxi of a specific pattern. 

Consensus pattern: [GSHLI VMFYTACl-{GSTAl-K-x(2)-[GSALVN]-[LI VMFA]-x-[GNAR]- x-R-[LI VMA]-[GA] (K is the py- 
ridoxal-P attachment site] 

30 

1 1] Bairoch A. Unpublished obsen^tk)ns (1992). 

[ 2] Sung M.H., Tanizawa K., Tanaka H.. Kuramitsu S., Kagamiyama H.. Hirotsu K., Okamoto A., Higuchi T, Soda 
K. J. Bk>l. Chem. 266:2567-2572(1991). 



35 [0244] 51 , Aminotransferases class-ll pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyrktoxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1] into subfamilies. One of these, called class-ll, currently consists of the following 
enzymes: - Glycine acetyltransferase (EC 2.3.1.29) . which catalyzes the addition of acetyl-CoA to glycine to form 

40 2-amino-3-oxobutanoate (gene kbl). - 5-aminolevulink: ackJ synthase (EC 2.3.1.37 ) (delta-Al_A synthase), which cat- 
alyzes the first step in henne bosynthesis via the Shemin (or C4) pathway, i.e. the addition of succinyl-CoA to glycine 
to form 5- aminolevulinate. - 8-amirw>-7-oxononanoate synthase (EC 2.3.1.47) (7-KAP synthetase), a bacterial enzyme 
(gene bk)F) whk:h catalyzes an intermediate step in the biosynthesis of biotin: the additkxi of 6-carboxy-hexanoyl-CoA 
to alanine to form 8-amino-7-oxononanoate. - Histidlnol-phosphate aminotransferase (EC 2.6.1.9 ). whwh catalyzes 

45 the eighth step in histidine bk)synthetic pathway: the transfer of an amino group from 3^imklazol-4-yl)-2-oxopropyl 
phosphate to glutamic acid to form histkJinol phosphate and 2-oxoglutarate. - Serine palmitoyltransferase (EC 2.3.1.50 ) 
from yeast (genes LCB1 and LCB2), which catalyzes the condensation of palmitoyl-CoA and serine to form 3-ketosph- 
inganine.The sequence around the pyridoxal-phosphate attachment site of this class of enzyme is sufficiently con- 
sented to albw the creatk)n of a specific pattern 

so Consensus pattern: T-[LIVMFYWl-[STAG]-K-[SAG]-ILIVMFYWR]-(SAG]-x(2)-[SAG] 
[K is the pyridoxal-P attachment sitej- 
[ 1] Bairoch A. Unpublished obsen/atk)ns (1991). 

[0245] 52. Aminotrar)sferases class-Ill pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
ss the covalent binding of the pyrkioxal- phosphate group to a lysine residue. On the basis of sequence similarity, these 
various enzymes can be grouped [1 ,2] into subfamilies. One of these, called class-Ill, currently consists of the following 
enzymes: - Acetytomithine aminotransferase (EC 2.6.1.11) which catalyzes the transfer of an amino group from acety- 
lomithine to alpha-ketoglutarate, yielding N-acetyl-glutamk:-5-semi-akiehyde and glutamic ackl. - Ornithine aml- 
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notransferase (EC 2.6.1.13) . which catalyzes the transfer of an amino group from ornithine to alpha-ketoglutarate, 
yielding glutamlc-6- semi-aldehyde and glutamic acid. ■ Omega-amino acid-pyruvate aminotransferase (EC 2.6.1. IB ), 
whbh catalyzes transamination between a variety of omega-amino acids, mono- and diamines, and pyruvate. It plays 
a pivotal role in omega amino acids metatxjiism. - 4-aminobutyrate aminotransferase (EC 2.6.1.19 ) (GABA transami- 

s nase), which catalyzes the transfer of an amino group from GABA to alpha-ketoglutarate. yielding succinate semial- 
dehyde and glutamic acid. - DAPA aminotransferase (EC 2.6.1.62) . a bacterial enzyme (gene bioA) which catalyzes 
an intermediate step in the biosynthesis of biotin, the transamination of 7-ketoa-aminopelargonic acid (7-KAP) to form 
7,8- diaminopelargonic ackl (DAPA). - 2,2-dialkylglycine decarboxylase (EC 4.1.1.64 ). a Pseudomonas cepacia en- 
zyme (gene dgd A) that catalyzes the decartx>xy lating amino transfer of 2,2-dialkylglycine and pyruvate to dialky I ketone, 

10 alanine and cartxjn dk>xide. - Glutamate-1 -semialdehyde aminotransferase (EC 5.4.3.8 ) (GSA). GSA is the enzyme 
involved in the second step of porphyrin biosynthesis, via the C5 pathway. It transfers the amino group on carbon 2 of 
glutamate-1 -semialdehyde to the neighbouring carbon, to give delta-aminolevulinic acid. - Bacillus subtilis aminotrans- 
ferase yhxA. - Bacillus subtilis aminotransferase yodT. - Haenrxjphilus influenzae aminotransferase HI0949. - 
Caenortiabditis elegans aminotransferase T01B11.2.The sequence around the pyridoxal-phosphate attachment site 

IS of this class ofenzyme is sufficiently conserved to alfow the creation of a specific pattern. 

Consensus pattern: [LIVMFYWC](2)-x-D-E-[IVA]-x(2)-G-lUVMFAGChx(0.1HRSACLI}-x-[GSADhx(12.16)-D-{UVM 

FCHUVMFYSTA]-x(2)- [GSA]-K-x(3)-[GSTADNVHGSAC] [K Is the pyrktoxal-P attachment sitej- 

[ 1] Bairoch A. Unpublished obsen/ations (1992).[ 2] Yonaha K.. NIshle M., Aibara S. J. Biol. Chem. 267:12506-12510 

(1992). 

20 [0246] 53. Ank repeat. There's no clear separation between noise and signal on the HMM search Ankyrin repeats 
generally consist of a beta, alpha, alpha, beta order of secondary structures. The repeats associate to form a higher 
order structure. 

[1] A, HolakTA, FEBS Lett 1997:401:127-132. 
25 [2] Lux SE, John KM, Bennett V, Nature 1990;345:736-739. 

[0247] 54. Aminotransferases class-l V signature 

[0248] Aminotransferases share certain mechanistic features with other pyridoxal-phosphate dependent enzymes, 
such as the covatent binding of the pyrkioxal-phosphate group to a lysine resklue. On the basis of sequence similarity, 
30 these various enzymes can be grouped [1.2] into subfamilies. One of these, called class-l V, currently consists of the 
fotbwing enzymes: 

- Branched-chain amino^cid aminotransferase (EC 2,6.1.42 ) (transaminase B), a bacterial (gene ilvE) and eukary- 
otlc enzyme which catalyzes the reversible transfer of an amino group from 4-methyl-2-oxopentanoate to gluta- 

3S mate, to form leucine and 2-oxoglutarate. 

- D-alanine aminotransferase (EC 2.6.1.21) . A bacterial enzyme which catalyzes the transfer of the amino group 
from D-alanine (and other D-ambio ackis) to 2-oxoglutarate. to form pyruvate and D-aspartate. 

- 4-amino-4-deoxychorismate (ADC) lyase (gene pabC). A bacterial enzyme that converts ADC into 4-aminoben- 
zoate (PABA) and pyruvate. 

40 

[0249] The above enzymes are proteins of about 270 to 41 5 amino-ackJ residues that share a few regk)ns of sequence 
similarity. Surprisingly, the best-consen/ed region does not include the lysine resklue to whfch the pyridoxal-phosphate- 
group is known to be attached, in ilvE. The regkxi that has been selected as a signature pattem is kx:ated some 40 
resklues at the C-tenminus skto of the PIP-lysine 
45 Consensus pattern: E-x-[STAGCI]-x(2)-N-fLIVMFACHFYJ-x(6,12)-[LIVMFJ-x-T-x(6,8)-[LIVM]-x-(GSHU\^^ 

[1] Green J.M.. Mericel W.K.. Nichols B P J. Bacterral. 174:5317-5323(1992). 
[2] Bairoch A. Unpublished obsen/atk)ns (1992). 

so [0250] 55. Aminotransferases class-V pyridoxal-phosphate attachnrtent site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes, such as 
the covalent binding of the pyrkioxal- phosphate group to a lysine resklue. On the basis dl sequence similarity, these 
various enzymes can be grouped [1 ,2] into subfamilies One of these, called class-V. currently consists of the folbwing 
enzymes: - Phosphoserine aminotransferase (EC 2.6.1.52 ). an enzyme which catalyzes the reversible interconverskxi 

ss of phosphoserine and 2-oxoglutarate to 3-phosphonooxypyruvate and glutamate. It is required both in the major phos- 
phorylated pathway of serine bk)synthesis and in pyridoxine biosynthesis. The bacterial enzyme (gene serC) is highly 
similar to a rabbit endometrial progesterone-induced protein (EPIP). whfch is probably a phosphoserine aminotrans- 
ferase [3]. - Serine-glyoxylate aminotransferase (EC 2.6.1.45 ) (SGAT) (gene sgaA) from Methylobacterium ex- 
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torquens. - Serine-pyruvate aminotransferase (EC 2.6.1.511 . This enzyme also acts as an alanine-glyoxylate ami- 
notransferase (EC 2.6.1.44) . In vertebrates, it is located in the peroxisomes and/or mitochondria. - Isopenlcillin N 
epimerase (gene cetD). This enzyme is involved in the biosynthesis of cephalosporin antibiotics and catalyzes the 
reversible isomerization of isopenicillin N and penicillin N. - NifS, a protein of the nitrogen fixation operon of some 

5 bacteria and cyanobactere. The exact function of nifS is not yet known. A highly similar protein has been found In fungi 
(gene NFS1 or SPL1). - TTie small subunit of cyanobacterial soluble hydrogenase (EC 1.12.-.-). - Hypothetical protein 
ycbU from Bacillus subtilis. - Hypothetical protein YFLOSOw from yeast. The sequence around the pyridoxal-phosphate 
attachment site of this class of enzyme is sufficiently consented to altow the creation of a specific pattern. 
Consensus pattern: [LIVFYCHT]-[DGH]-{LI VMFYACHLIVMFYA]-x(2)-[GSTAC]-[GSTA]- [HQR]-K-x(4,6)-G-x-[GSAT]- 

10 x-(U VMFYSAC] [K is the pyridoxal-P attachment site]- 



[ 1] Ouzounis C. Sander C. FEBS Lett. 322:159-164(1993). 
[ 2] Bairoch A. Unpublished obsen^tions (1992). 

1 3] van der Zel A, Lam H.-M., Winkler M,E. Nucleic Acids Res. 17:8379-8379(1989). 

IS 

[0251] 56. Annexins repeated domain signature 

Annexins [1 to 6] are a group of calcium-binding proteins that associate reversibly with membranes. They bind to 
phospholipid bilayers in the presence of micromolar free calcium concentration. The binding is specific for calcium and 
for ackiic phospholipkJs. Annexins have been claimed to be Involved in cytoskeletal interactions, phospholipase inhi- 

20 bition, intracellular signalling, anticoagulation, and membrane fusion. Each of these proteins consist of an N4erminal 
domain of variable length foltowed by four or eight copies of a conserved segment of sixty one residues. The repeat 
(sometimes known as an 'endonexin foW) consists of five alpha-helves that are wound into a right-handed superhelix 
[7].The proteins known to betong to the annexin family are listed betow: - Annexin I (LIpocortin 1) (Calpactin 2) (p35) 
(Chromobindin 9). - Annexin II (Lipocortin 2) (Calpactin 1) (Protein I) (p36) (Chromobindin 8). - Annexin 111 (LIpocortin 

2S 3) (PAP-Ill). - Annexin IV (Lipocortin 4) (Endonexin I) (Protein II) (Chronwbindin 4). - Annexin V (LIpocortin 5) (Endon- 
exin 2) (VAC-alpha) (Anchorin Cll) (PAP-I). - Annexin VI (Lipocortin 6) (Protein III) (Chromobindin 20) (p68) (p70). This 
Is the only known annexin that contains B (instead of 4) repeats. - Annexin VII (Synexin). - Annexin VIII (Vascular 
anticoagulant-beta) (VAC-beta). - Annexin IX from Drosophlla. - Annexin X from Drosophila. - Annexin XI (Calcyclin- 
associated annexin) (CAP-50). - Annexin XII from Hydra vulgaris. - Annexin XHI (Intestine-specific annexin) (ISA).The 

30 signature pattern for this domain spans posrtkms 9 to 61 of the repeatand includes the only perfectly consented residue 
(an arginlne in positbn 22)- 

Consensus pattern: [TGHSTyi-x(8)-[UVMF]-x(2)-R-x(3)-[DEQNH]-x(7)-{IFY]- x(7)-[LIVMF]-x(3)-[LIVMF]-x(11)- 
[LIVMFAI-x(2)-[UVMF]- 

3S [1] Raynal P. Pollard H.B. Biochim. Biophys. Acta 1197:63-93(1994). 

[ 2] Barton G.J., Newman R.H., Freemont P.S., Crumpton M.J. Eur. J. Bkjchem. 198:749-760(1991). 
[ 3] Burgoyne R.D., Geisow M.J. Cell Cateium 10:1-10(1989). 

[ 4] Haigler H.T.. Fitch J.M., Jones J.M., Schlaepfer D.D. Trends Bkx:hem. Sci. 14:48-50(1989). 
[ 5) Klee C.B. Bkx:hemistry 27:6645-6653(1988). 
40 1 6] Smith PD., Moss S.E. Trends Genet. 10:241-246(1994). 

[ 7] Huber R.. Roemisch J., Paques E,-P. EMBO J. 9:3867-3874(1990). 
[ 8] Fiedler K.. Simons K. Trends Bkx:hem. Sci. 20:177-178(1995). 

[0252] 57. (arf_1 ) ADP-ribosy latk>n factors family signature 

45 ADP-rlbosylation factors (ARF) [1 ,2,3,4] are 20 Kd GTP-binding proteins Involved in protein trafficking. They may mod- 
ulate vesicle budding and uncoating within the Golgi apparatus. ARF's also act as allosteric activators of cholera toxin 
ADP-rlbosyltransferase activity. They are evolutionary consented and present in all eukaryotes. At least six forms of 
ARF are present in mammals and three in budding yeast. The ARF family also includes proteins highly related to ARF's 
but whch lack the cholera toxin cofactor activity, they are collectively known as ARL's (ARF-like). ARD1 is a 64 Kd 

so mammalian protein of unknown biotogk:al function that contains an ARF domain at its C-terminal extremity. Proteins 
from the ARF family are generally included In the RAS 'superfamily' of small GTP-binding proteins [5], but they are 
only slightly related to the other RAS proteins. They also differ from RAS proteins in that they lack cysteine reskiues 
at their C-termini and are therefore not subject to prenylation. The ARFs are N-temiinally myristoylated (the ARLs have 
not yet been shown to be modified in such a fashk)n). A conserved regkxi in the C-terminal part of ARF's and ARL's 

s$ has been selected as a signature pattern. 

Consensus pattern: [HRQThx-[FYWI]-x-{LIVM]-x(4)-A-x(2)-G-x(2)-[LIVM]-x(2)-IGSA]-[LIVMF]-x-{WKHLIVMl- 
Nole: proteins betonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-k)op) (see <PDOCX)0017 
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[ 1] Boman A.L, Kahn R.A. Trends Biochem, Set, 20:147-150(1995). 
[ 2] Moss J., Vaughan M. Cell. Signal. 4.367-399(1993). 
[ 3] Moss J.. Vaughan M. Prog. Nucleic Acid Res. Mol. Biol. 45:47-65(1993), 
[4] Amor J.C.. Harrison D.H., Kahn RA. Ringe D. Nature 372:704-708(1994). 
5 [5] Valencia A.. Chardin R, Wiltinghofer A.. Sander C. Biochemistry 30:4637-4648(1991). 

[0253] (arf_2) ATP/GTP-bindIng site motif A (P-k)op) 

From sequence comparisons and crystallographic data analysis it has been shown [1.2.3,4,5,6] that an appreciable 
proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. The best 

10 consented of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand and an 
alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is generally 
referred to as the 'A* consensus sequence [1] or the 'P-loop' [5].There are nunrierous ATP- or GTP-binding proteins in 
which the P-loop Is found. A number of protein families for which the relevance of the presence of such motif has been 
noted are listed bebw: - ATP synthase alpha and beta subunits (see <PDOC00137 >). - Myosin heavy chains. - Kinesln 

IS heavy chains and kinesln-like proteins (see <PDOC00343> ). - Dynamins and dynamln-IIke proteins (see 
<PDOC00362 >). - Guanylate kinase (see <PDOC00670 >). - Thymkiine kinase (see <PDOG00524 >). - ThymkJylate 
kinase (see <PDOC01034>). - Shikinnate kinase (see <PDOCgg868>). - Nitrogenase iron protein family (nIfH/fncC) 
(see <PDOC00580>). - ATP-binding proteins involved in 'active transport (ABC transporters) [7] (see <PDOC001B5 >). 
- DNAandRNAhelicases [8,9.10]. -GTP-binding etongation factors (EF-Tu, EF-lalpha, EF-G. EF-2. etc.). -Ras family 

20 of GTP-binding proteins (Ras. Rho, Rab, Ral, Yptl. SEC4, etc.). - Nuclear protein ran (see <PDQC00859 >V - ADP- 
ribosylatfon factors family (see <PDQC007B1 >). - Bacterial dnaA protein (see <PDOCM771 >). - Bacterial recA protein 
(see <PDOC00131>). - Bacterial recF protein (see <PDOC00539> >. - Guanine nucleotkJe-binding proteins alpha sub- 
units (Gi, Gs, Gt, GO, etc.). - DNA mismatch repair proteins mutS family (See <PDOC00388>). - Bacterial type II 
secretin system protein E (see <PDOCg0567>).Not all ATP- or GTP-binding proteins are picked-up by this motif. A 

2S number of proteins escape detectk)n because the structure of their ATP-binding site is completely different from that 
of the P-loop. Examples of such proteins are the El -E2 ATPases or the glycolylk: kinases. In c^her ATP-or GTP-binding 
proteins the flexible loop exists in a slightly different form; this is the case for tubulins or protein kinases. A special 
mentfon must be resented for adenylate kinase, in whfch there is a single deviatk>n from the P-k)op pattern: in the last 
positton Gly is found instead of Ser or Thr. 

30 Consensus pattern: [AG]-x(4)-G-K-[ST|- 

[ 1] Walker J.E., Saraste M„ Runswfck M.J., Gay N.J. EMBO J. 1:945-951(1982). 
[ 2] Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry DC. Kuby S.A.. MikJvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911 (1986). 
3S [4]DeverTE., Glynias MJ.. Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 

[ 5] Saraste M., SibbaW PR.. Wittinghcrfer A Trends Bkx;hem. Sci. 15:430-434(1990). 
[6] Koonin E.V. J. Mol. Biol. 229:1165-1174(1993). 

[7] Higgins C.F., Hyde S.C.. Mimmack M.M.. Gileadi U., Gill D.R.. Gallagher M P J. Bwenerg. Biomembr. 22: 
571-592(1990). 

40 [ 8] Hodgman TC. Nature 333:22-23(1 988) and Nature 333:578-578(1 988) (Errata). 

[ 9] Linder P, Lasko R. Ashbumer M., Leroy P. Nielsen P.J.. Nishi K., Schnier J.. Slonimski PR Nature 337:121-122 
(1989). 

[10] Gorbalenya A.E., Koonin E.V., Donchenko A.P.. Blinov V.M. Nuciek: Acids Res. 17:471^730(1989). 

45 [0254] 58. Arginase family signatures 

The folbwing enzymes have been shown [1] to be evolutwnary related: - Arginase (EC 3.5.3.1), a ubiquitous enzyme 
whch catalyzes the degradation of arginine to ornithine and urea [2]. - Agmatinase (EC 3.5.3.11) (agmatine ureohy- 
drolase). a prokaryotc enzyme (gene speB) that catalyzes the hydrolysis of agmatine Into putrescine and urea. - 
Formiminoglutamase (EC 3.5.3.8) (formiminoglutamate hydrolase), a prokaryotic enzyme (gene hutG) that hydrolyzes 

so N-formimino-glutamate into glutamate and fonrnamide. - Hypothetk:al proteins from methanogenic archaebacteria. 
These enzymes are proteins of about 300 amirto-acid reskJues. Three consen/ed regions that contain charged reskiues 
whbh are involved in the binding of the two manganese ions [3] can be used as signature patterns- 
Consensus pattern: [LIVMF]-G-G-x-H-x-[LIVMTl-[STAV]-x-[PAG]-x(3)-[GSTA] [H binds manganese]- 
Consensus pattern: [UVM](2)-x-(LIVMFY]-D-[AS]-H-x-D [The two Us and the H bind manganese]- 

ss Consensus pattern: [STHUVMFY]-CHLIVMJ-D-x(3)-[PAQ]-x(3)-P-[GSA]-x(7)-G [The two D's bind manganese] 

( 1] Ouzounis C„ Kyrpktes N.C. J. Mol. Evol. 39:101-104(1994). 

[ 2] Jenkinson CP, Grody W.W.. Cederbaum S.D. Comp. Bk)chem. Physk)l. 1148:107-132(196). 
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[ 3] Kanyo ZR. Scolnick LH.. Ash D.E.. Christiansen D.W. Nature 383:554-557(1996). 
[0255] 59. (asp) Eukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases, (EC 3.4.23.-) are a wkJely distributed family of proteolytic enzymes 

s (1 ,2,3] known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukary- 
otes are monomerk: enzymes which consist of two domains. Each domain contains an active site centered on a catalytic 
aspartyl reskiue.The two domains most probably evolved from the duplbation of an ancestral gene encoding a primor- 
dial domain. Currently known eukaryotic aspartyl proteases are: - Vertebrate gastric pepsins A and C (also known as 
gastrk:sin). - Vertebrate chymosin (rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal 

10 cathepsins D (EC 3.4.235) and E (EC 3.4.23.34 V - Mammalian renin (EC 34.2315) whose functk>n is to generate 
angbtensin I from angkytensinogen in the plasma. - Fungal proteases such as aspergillopepsin A (EC 3.4.23.18 ). 
candldapepsin (EC 3.4.23.24) , mucoropepsin (EC 3.4.23.23) (mucor rennin), endothlapepsin (EC 3.4.23.22 ). polypes 
ropepsin (EC 34.23.29) . and rhizopuspepsin (EC 3.4.23.21 ). - Yeast saccharopepsin (EC 3.4. 23. 25 ) (proteinase A) 
(gene PEP4). PEP4 is implcated in posttranslational regulation of vacuolar hydrolases. - Yeast bamer pepsin (EC 

IS 34.23.35) (gene BAR 1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone. 
- Fission yeast sxal whk:h is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnavlruses, encode for anaspartyl protease which is an homodimer of a chain of about 95 to 
125 amino acids. In most retroviruses, the protease is encoded as a segment of apolyprotein whk;h is cleaved during 
the maturation process of the virus. It is generally part of the pol poly protein and, more rarely, of the gagpolyprotein. 

20 Consewatbn of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases aitows us to devetop a single signature pattern for both groups of protease. 
Consensus pattern: [LIVMFGACHUVMTADNHUVFSAJ-0[ST]-G-(STAV]-[STAPDENQ]- x-[LIVMFSTNC]-x-[LlVMF- 
QTA] [D is the active site reskiue] Note: these proteins bekxtg to families A1 and A2 in the classification of peptkJases 
[4.E1 

2S 

1 1] Foltmann B. Essays Bkxjhem. 17:52-84(1981). 

[ 2] Davies D.R. Annu. Rev. Bksphys. Chem. 19:189-215(1990). 

[ 3) l=teo J.K.M., Erk:ksc.lJ.W.. Wkxiawer A. Biochemistry 30:4663-4671(1991). 

[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:105-120(1 995). 

30 

[0256] 60. (BIRA) Biotin repressor 

[1] Wilson KP, Shewchuk LM, Brennan RG, Otsuka AJ, Matthews BW; Proc Natl Acad Sci USA 1992;89:9257-9261 
[0257] 61. BTB/POZ domain 

The BTB (for BR-C, ttk and bab) [1] or POZ (for Pox vims and Zinc finger)[2] donnain is present near the N-terminus 
35 of a fraction of zinc finger 

(zf-C2H2) proteins and in proteins that contain the Keteh motif 

such as Kek:h and a family erf pox virus prrteins. The BTB^Z donnain mediates honrxxneric dimerisation and in some 
instances heteromerk: dimerisatfon [2].Trie structure of the dimerised PLZF BTB/POZ domain has been solved and 
consists of a tightly intertwined homodimer. The central scaffokiing of the protein is made up of a cluster of alpha- 
40 helices flanked by short beta-sheets at both the top and bottom of the molecule [3]. POZ domains from several zinc 
finger proteins have been shown to mediate transcriptional repressbn and to interact with components of histone 
deacetylase co-repressor complexes Including N-CoR and SMRT 14.5.6]. The POZ or BTB domain is also known as 
BR-C/TtkorZiN 

45 [1] Zollman S, Godt D, Prive GG, Couderc JL, Laski FA; Proc Natl Acad Sci U S A 1994;91 :10717-10721 . 

[2]Bardwell VJ, Treisman R; Genes Dev 1994;8:1664-1677. 

[3] Ahmad KF, Engel CK, Prive GG; Proc Natl Acad Sci U S A 1998;95:12123-12128. 

[4] Deweindl C, Albagli O, Bemardin F, Dhordain P. Quief S. Untoine D. Kerckaert JR Leprince D; Cell Growth 
Differ 1 995;6: 1 495-1 503. 
so [5] Huynh KD, Bardwell VJ; Oncogene 1 998; 17:2473-2484. 

[6] Wong CW, Privalsky ML; J Biol Chem 1998;273:27695-27702. 

[0258] 62. (Bac GSPproteins) Bacterial type II secretkxi system protein D signature 

A number of bacterial proteins, some of which are involved in a general secretkxi pathway (GSP) for the export of 
55 proteins (also called the type II pathway) [1 to 5], have been found to be evolutionary related. These proteins are listed 
below: - The *D' protein from the GSP operon of: Aeromonas (gene exeD); Enwinia (gene outD); Escherchia coli (gene 
yheF), Klebsiella pneumoniae (gene pulD); Pseudomonas aeruginosa (gene xcpQ); >flbrk> cholerae (gene epsD) and 
Xanthomonas campestris (gene xpsO). - comE from Haemophilus influenzae, oivolved in competence (DNA uptake). 
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- pilQ from Pseudomonas aeruginosa, which is essential for the formation of the pili. - hofQ (hopQ) from Escherichia 
coli. - hrpH from Pseuck)monas syringae. which is involved in the secretion of a proteinaceous elidtor of the hypersen- 
sitivity response in plants. - hrpAl from Xanthomonas campestris pv. vesicator'e, which Is also involved in the hyper- 
sensitivity response. - mxiD from Shigella flexnerl which is involved in the secretion of the Ipa invasins which are 

s necessary for penetration of intestinal epithelial cells. - omc from Neisseria gonorrhoeae. - yssC from Yersinia entero- 
colitica virulence plasmid pYV. which seems to be required for the export of the Yop virulence proteins. - The gpIV 
protein from filamentous phages such as f 1 , ike, or ml 3. GpIV is said to be involved in phage assembly and morpho- 
genesis. These proteins all seem to start with a signal sequence and are thought to be integral proteins in the outer 
membrane. As a signature pattern a conserved region in the C4erminat section of these proteins has been selected 

10 Consensus pattern: IGR].[DEQKG]4ST\^]4LIVMA](3HGA]-G-[UN^FY>x(11)-[LIVM]-P-lUVMFYV^ 
[GSAE]-x-[UVM]-P4LIVMFYW](2)-x(2HLV]-F 

1 1] Salmond G.P.C., Reeves P.J. Trends Biochem. Sci. 18:7-12(1993). 

[ 2] Reeves RJ.. Whitcombe D.. Wharam S.. Gibson M., Allison G., Bunce N., Barallon R., Douglas R, Mulholland 
IS v., Stevens S., Walker S., Salmond G.RC. Mol. Microbiol. 8:443-456(1 993). 

[ 3] Martin RR, Hobbs M.. Free RD., Jeske Y. Mattk:k J.S. Mol. Microbk>l. 9:857-868(1993). 
[ 4] Hobbs M., Mattick J.S. Mol. Microbbl. 10:233-243(1993). 
[ 5] Genin S.. Boucher C.A. Mol. Gen. Genet 243:112-118(1994). 

20 [0259] 63. (Bac globin) Protozoan/cyanobacterial globlns signature 

Gk)bins are heme-containing proteins involved in binding and/or transporting oxygen [1]. Almost all gk>bins belong to 
a large family (see <PDOC007935>), the only exceptions are the foltowing proteins which form a family of their own 
[2,3]: - Monomeric henrKsglobins from the protozoan Paramecium caudatum, Tetrahymena pyrifonmis and Tetrahymena 
thermophila. - Cyanogtobin from the cyanobacteria Nostoc commune. - Globins LI637 and LI410 from the chloroplast 

2S of the alga Chlamydomonas eugametos. - Mycobacterium tubercutosis hypothetrcal protein MtCY48.23.These proteins 
contain a conserved histidine which could be involved in heme-binding. As a signature pattern, a conserved regkm 
that ends with this resklue was used 

Consensus pattem: F-(LFJ-x(5)-G-{PA]-x(4)-G-(KRA]-x-[LIVM]-x(3)-H- 

30 [1] Concise Encyctopedia Bkjchemistry, Second Edition. Walter de Gruyter. Berlin New-York (1 988). 

[ 2] Takagi T Curr. Opin. Struct. Biol. 3:413^18(1993). 

[ 3] Couture M., Chamberland H., St-Pierre B., Lafontaine J., Guertin M.; Mol. Gen. Genet. 243:185-197(1994). 

[0260] 64. Band 7 protein family signature 

35 Mammalian band 7 protein [1] (also known as 7.2B or stomatin) is an integral membrane phosphoprotein of red bkxxi 
cells thought to regulate cation conductance by interacting with other proteins of the juncttonal complex of the mem- 
brane skeleton. Structurally, band 7 is evolutionary related to the foltowing proteins: - Caenorhabditis elegans protein 
mec-2 (2). Mec-2 positively regulates the activity of the putative mechanosensory transduction channel. It may links 
the mechanosensory channel andthemfcrotubule cytoskeleton of the touch receptor neurons. - Caenorhabditis elegans 

40 protecns sto-1 to sto-4. - Caenorhabditis elegans protein unc-1 . - Escherichia coli hypothetical protein ybbK. - Myco- 
bacterium tubercuk)sis hypothetical protein MtCY277.09. - Synechocystis strain PCC 6803 hypothetical protein sirll 28. 

- Methanococcus jannaschii hypothetical protein MJ0827.Structurally all these proteins consist of a short N^erminal 
domain which is followed by a transmembrane region and a variable size (from 1 70 to 350residues) C-terminal domain . 
As a signature pattem, a conserved regkxi tocated about llOresidues after the transmembrane domain was selected 

45 Consensus pattem: R-x(2)-[UV]-{SAN}-x(6)-(LI Vl-D-x(2)-T-x(2)-W-G-(LI V]- [KRH]-{LTV]-x-[KR].(LIVJ-E-[LI V1-[KR]- 

[ 1] Gallagher RG., Forget B.G. J. Biol. Chem. 270:26358-26363(1995) . 
( 2] Huang M.. Gu G.. Ferguson E.L. Chalfie M. Nature 378:292-295(1995). 

so [0261] 65. Banwin domain signatures 

Banwin [1] is a barley seed protein of 125 residues that binds weakly a chrtinanatog. It contains six cysteines involved 
In disulfide bonds, as shown In the foikywing schematk: representatton. 

xxxxxxxxxxxxxxxCxxxxxxxxxxCxxxxCxCxxxxxxxxCxxxxxxxxxxxxxxxxxxCxIIII-f — 

^ H VC: conserved cysteine involved in a disulfide bond.'**: position of the patterns. Banwin 

is closely related to the foltowing proteins: - Hevein, a wound-induced protein found in the latex of rubber trees. - HEL, 
an Aiabtoopsis thaliana hevein-like protein [2]. - Wini and win2, two wound-induced proteins from potato. - Pathogen- 
esis-related protein 4 from tobacco. Hevein and the win1/2 proteins consist of an N-termlnal chitin-binding domain 
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followed by a barwin-like C-terminal ciomain. Barwin and its related proteins could be involved in a defense mechanism 
in plants. As signature patterns, two highly consented regions that contain some of the cysteines were selected 
Consensus pattem: C-G-[KR]-C-L-x-V-x-N JThe two C*s are involved in disulfide bonds]- 
Consensus pattem: V-[DNhY-{EQ]-F-V-[DN]-C (C is involved In a disulfide bond]- 

1 1] Svensson B., Svendsen I., Hoejrup R, Ftoepstorff P.. Ludvigsen S., Poulsen RM. Biochemistry 31:8767-8770 
(1992). 

[ 2] Potter S., Uknes S.. Lawlon K., Winter A.M.. Chandler D.. Dimaio J., Novitzky R.. Ward E., Ryals J. MoL Plant 
Microbe Interact. 6:680-685(1993). 

[0262] 66. (Bowman-Birk leg) Bowman-Birk serine protease inhibitors family signature 

PROSITE cro6s-ref erence(s). The Bowman-Birk inhibitor family [1 ] is one of the numerous families of serine proteinase 
inhibitors. As it can be seen in the schematx: representatkx), they have a duplicated structure and generally possess 
two distinct Inhibitory sites: 



+ ^ 

I + ^ ^ + + + I 

I I I I I I I I 
xxCCxxCxxCxx#xxCxxCxxxxCxxxCxxxCxxxxCxx#xxCxxCxxCxxCxx 

II I I II 

+ , + 1 

+ + 

< 70 residues > 

'C: conserved cysteine involved in a disulfide bond, 
active site residue, 
position of the pattem. 

[0263] These inhibitors are found in the seeds of all leguminous plants as well as in cereal grains. In cereals they 
exist in two forms, one of which is a duplication of the bask: structure shown above [2]. The pattem that was developed 
to pick up sequences bek)nging to this family of inhbitors is in the central part of the domain and includes four cysteines. 
[0264] Consensus pattem C-x(5.6)-^DENQKRHSTA]-C^PASTDHHPASTDK^[ASTDV]-C-[NDKS]-[DEKRHSTA]-C 
[The four Cs are involved in disulfide borKls] Note this pattem can be found twk:e n some duplicated cereal inhibitors. 

[ 1] Laskowskt M.. Kato I. Annu. Rev Biochem. 49:593^26(1980). 

[ 2] Tashiro M.. Hashino K.. Shiozaki M., Ibuki R, Maki Z J. Bkx:hem. 102:297-306(1987). 

[0265] 67. Pathogenesis-related protein Bet v I family signature 

[0266] A number ci plant proteins, whk:h all seem to be invoh^ed in pathogen defense response, are structurally 
related [1,2,3]. These proteins are: 

- Bet V I. the major pollen allergen from white birch. Bet v I is the main cause of type I ailergk: reactions in Europe, 
North America and USSR. 

- Aln g I, the major pollen allergen from aklet 
Api G I, the major allergen from celery. 

Car b I, the major pollen allergen from hombeam. 

- Cor a I, the nr»jor pollen allergen from hazel. 

- Mai d I, the major pollen allergen from apple. 
Asparagus wouruJ-induced protein AoPRI . 
Kidney bean pathogenesis-related proteins 1 and 2. 

- Parsley pathogenesis-related proteins PR1 -1 and PR1 -3. 

Pea disease resistance response proteins pl49, pl176 and DRRG49-C. 
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- Pea abscisic acid-responsive proteins ABR1 7 and ABR1 8. 

- Potato pathogenesis-related proteins STH-2 and STH-21 . 

- Soybean stress-induced protein SAM22. 

s [0267] These proteins are thought to be intracellularly located. They contain from 155 to 1 60 amino acid residues. 
As a signature pattern, a conserved region located in the third quarter of these proteins has been selected 
Consensus pattern: G-x(2HUVMF]-x(4)-E-x(2HCSTAEN]-x(8,9HGND]-G-[GSJ- [CS]-x(2)-K.x(4HFY)- 

[1] Breiteneder K. Pettenburger K.. Bito A., Vfalenta R., Kraft D., Rumpold H.. Scheiner O., Breitenbach M. EMBO 
10 J. 8:1935-1938(1989). 

12] Crowell D., John M.E., Russell D.. Amasino R.M. Plant Mol. Biol. 18:459-466(1992). 
[3] Wiamer SAJ., Scott R.. Draper J. Plant Mol. Biol. 19:555-561(1992). 

[0268] 68. bZIP transcription factors basic domain signature 

IS The bZIP superfamily [1,2,] of eukaryotic DNA-binding transcription factors groups together proteins that contain a 
basic region mediating sequence-specific DNA-binding folbwed by a leucine zipper required for dimerization. This 
family is quite large, therefore only a parital list of some representative members appears here. - Transcription factor 
AP-1, which binds selectively to enhancer elements in the cis control regions of SV40 and metallothlonein 1 1 A. AP-1, 
also known as c-jun, is the cellular homolog of the avian sarcoma virus 17 (ASV17) oncogene v-jun. - Jun-B and jun- 

20 D, probable transcription factors which are highly similar to jun/AP-1 The fos protein, a protooncogene that forms a 
non-covalent dimer with c-jun. - The fos-related proteins fra-1 , and fos B. - Mammalian cAMP response element (CRE) 
binding proteins CREB, CREM. ATF-1, ATF-3. ATF^, ATF-5. ATF-6 and LRF-1. - Maize Opaque 2, a trans-acting 
transcriptional activator involved in the regulation of the production of zein proteins during endosperm. - Arabidopsis 
G-box binding factors GBF1 to GBF4, Parsley CPRF-1 to CPRF-3, Tobacco TAF-1 and wheat EMBP-1. All these 

2S proteins bind the G-box promoter elements of many plant genes. - Drosophila protein Giant, which represses the 
expression of both the kruppel and knirps segmentatkxi gap genes. - Drosophila Box B binding factor 2 (BBF-2), a 
transcriptional activator that binds to fat body-specific enhancers of ak^ohol dehydrogenase and yolk protein genes. - 
Drosophila segmentation protein cap'n'collar (gene cnc), which is involved in head morphogenesis. - Caenorhabditis 
elegans skn-1 . a devetopmental protein involved in the fate of ventral blastomeres in the early embryo, - Yeast GCN4 

30 transcriptk>n factor, a component of the general control system that regulates the expression of amino acid-synthesizing 
enzymes in response to amirK> acid stan/ation, and the related Neurospora crassa cpc-1 protein. - Neurospora crassa 
cys-3 which turns on the expression of structural genes whfch encode sulfur-catabolk: enzymes. - Yeast MET28, a 
transcriptkxial activator of sulfur amino acids metabolism. - Yeast PDR4 (or YAP1), a tianscriptlonal activator of the 
genes for some oxygen detoxification enzymes. - Epstein-Barr virus trans-activator protein BZLF1 

35 Consensus pattern: [KR]-x(1 ,3)-tRKSAQ]-N-x(2)-[SAQ](2)-x-[RKTAENQ]-x-R-x-[RK]- 

[ 1] Hurst H.C. Protein Prof. 2: 105- 168(1 995). [2] Ellenberger T. Curr. Opin. Struct. Biol. 4:12-21(1994). 
[0269] 69. Biotin-requiring enzymes attachment site 

Biotin, which plays a catalytic role in some carboxyl transfer reactions, is covalently attached, via an amkJe bond, to a 
lysine residue in enzymes requiring this coenzyme [1 .2.3,4]. Such enzymes are: 

40 

- Pyruvate carboxylase (EC 6.4. 1 . 1 ). 

- Acetyl-CoA cartx>xylase (EC 6.4. 1 .2). 

- Propionyl-CoA carboxylase (EC 6.4.1 .3). 

- Methyk:rotonoyl-CoA carboxylase (EC 6.4. 1 .4). 
45 - Geranoyl-CoA carboxylase (EC 6.4. 1 .5). 

Urea carboxylase (EC 6.3.4.6). 

- Oxatoacetate decartx>xylase (EC 4. 1 . 1 .3). 

- Methylmatony l-CoA decarboxylase (EC 4. 1 . 1 .41 ). 

- Glutaconyl-CoA decarboxylase (EC 4. 1 . 1 .70). 

so - Methylmalonyl-CoA carboxyl-transf erase (EC 2.1 .3. 1 ) (transcarboxylase). Sequence data reveal that the region 
around the biocytin (biotin-lysine) reskJue is well consen/ed and can be used as a signature pattern. 

[0270] Consensus patternl6N]-[DEQTR]-x-[LIN^FYJ.x(2)-{LIVM]-x-[AIV]-M-K-[LMAT]-x(3)-[LIVMhx-[SAV] [K Is the 
biotin attachment site] Note the domain around the bkitin-binding lysine residue is evolutionary related to that around 
ss the lipoyl-binding lysine residue of 2-qxo acid dehydrogenase acyltransferases 

[ 1] Knowles J.R. Annu. Rev. Biochem. 58:195-221(1989). 

[ 2] Samols D., Thronton C.G., Murtif V.L, Kumar G.K„ Haase FC. Wood H.G. J. Biol. Chem. 263:6461-6464 
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(1988). 

1 3] Goss N.H., Wood KG. Meth. Enzymol. 107:261-278(1984), 

( 4] Shenoy B.C., Xle Y, Park V.L. Kumar G.K., Beegen K, Wood KG.. Samols D. J. Biol. Chem. 267:18407-18412 
(1992). 

5 

[0271] 2-oxo acid dehydrogenases acyltransferase component lipoyl binding site 

The 2-oxo acid dehydrogenase muttlenzyme complexes [1 ,2] from bacterial and eukaiyotic sources catalyze the oxi- 
dative decarboxylation of 2-oxo acids to the corresponding acyl-CoA. The three members of this family of mullienzyme 
complexes are: 

10 

Pyruvate dehydrogenase complex (PDC). 

- 2-oxoglutarate dehydrogenase complex (OGDC). 

- Branched-chain 2-oxo acid dehydrogenase complex (BCOADC). 

'5 These three complexes share a common architecture: they are composed of multiple copies of three component en- 
zymes - El . E2 and E3. El is a thiamine pyrophosphate-dependent 2-oxo acid dehydrogenase, E2 a dihydrolipamide 
acyltransferase, and E3 an FAD-containing dihydrolipamide dehydrogenase. 

[0272] E2 acyltransferases have an essential cofactor, lipoic acid, which is covalently bound via a amide linkage to 
a lysine group. The E2 components of OGCD and BCOACD bind a single lipoyl group, while those of PDC bind either 
20 one (in yeast and in Bacillus), two (in mammals), or three (in Azotobacter and In Escherichia coll) lipoyl groups [3]. 
In additbn to the E2 conrtponents of the three enzymatic complexes described above, a lipox: acki cofactor is also 
found In the folbwing proteins: 

- H-protein of the glycine cleavage system (GCS) [4]. GCS is a muttlenzyme complex of four protein components, 
2S which catalyzes the degradation of glycine. H protein shuttles the methylamine group of glycine from the P protein 

to the T protein. H-protein from either prokaryotes or eukaryotes binds a single lipoic group. 

- Mammalian and yeast pyruvate dehydrogenase complexes differ from that of other sources, in that they contain, 
in small amounts, a protein of unknown functksn - designated protein X or component X Its sequence is closely 
related to that of E2 subunits and seems to bind a lipoic group [5]. 

30 - Fast migrating protein (FMP) (gene acoC) from Atealigenes eutrophus [6]. 

This protein is most probably a dthydrolipamkie acyltransferase involved in acetoin metabolism. 

A signature pattern was devek)ped whk;h allows the detectbn of the lipoyl-binding site. 

[0273] Consensus pattemIGNhx(2)-lLIVF]-x(5)-[LIVFC]-x(2)-[LIVFA]-x(3)-K-[STAIV]-[STAVQDN]-x(2)-[LIVMFS]-x 
3S (5)-[GCN]-x-[LIVMFY] {K is the lipoyl-binding site] Note the donrtain around the lipoyl-binding lysine residue is evolu- 
tkxiary related to that arourul the biotin-blnding lysine residue of biotin requiring enzymes 

[ 1] Yeaman S.J. Bkx:hem. J. 257:625-632(1989). 
i 2] Yeaman S.J. Trends Bkx^hem. Scl. 11:293-296(1986). 
40 [ 3] Russel G.C., Guest J.R Bkxhim. Biophys. Acta 1076:225-232(1991). 

[ 4] Fujiwara K., Okamura-lkeda K.. Motokawa Y. J. Bbl. Chem. 261:8836-8841(1986). 

[ 5] Behal R.H., Browning K.S.. HaJI TB.. Reed LJ. Proc. Matl. Acad. Sci. U.S.A. 86:8732-8736(1989). 

1 6] Priefert H.. Hein S.. Krueger N.. Zeh K., Schmkit B.. Steinbuechel A. J. Bacterol. 173:4056-4071(1991). 

45 [0274] 70. C2 (C2 domain) Number of members: 295 

Some Isozymes of protein kinase C (PKC) [1,2] contain a domain, known as C2, of about 116 amino-ackJ residues 
which is located between the two copies of the CI domain (that bind phorbol esters and diacylglycerol) (see 
<PDOC00379>) and the protein kinase catalytk: domain (see <PDOC00100>). Regk)ns with significant homotogy 
[3.E1] to the C2-domain have been found in the folk>wing proteins: 

so 

- PKC isoforms alpha, beta and ganrwna and Drosophila isoforms PKC1 and PKC2. 

- PKC isoforms detta, epsiton and eta, Caenorhabditis elegans kin-13 and yeast PKC1 have a C2-like domain at 
the N-terminal extremity [41- 

- Yeast cAMP dependent protein kinase SCH9 contains a C2-nke domain. 

ss - Mammalian phosphatidylinositol-specific phospholipase C (PI-PLC) (see <PDOC50007>) isoforms beta, gamma 
and delta as well as several non-mammalian PI-PLCs have a C2-like domain C-terminal of the catalytic domain. 

- Mammalian and plants phosphattdylinositol-3-kinase have a C2-like domain in the central regbn of the 110 Kd 
catalytk: subunit 
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- Yeast phosphatidylserine-decartwxylase 2 (gene PSD2) contains a C2 domain in its central region. 

- Cytosolic phospholipase D from plants and cytosolic phospholipase A2 have a C2-like domain at their N-terminus. 

- Synaptolagmins (p65). This is a family of related synaptic vesicle proteins that bind acidic phospholipids and that 
may have a regulatory role in the membrane Interactions during trafficking of synaptic vesicles at the active zone 

s of the synapse. Alt isoforms of synaptotagmins have two copies of the C2 domain in their C-terminal region. 

Rabphilin-3A a synaptic protein contains two C2 domains. 

- Caenorhabditis elegans protein unc-13 whose function Is not known. Unc-13 has a C2 domain In its central part 
and a C2-like domain at the C-terminus. 

- rasGAP and the breakpoint cluster protein bcr have a C2-donnain C-temiinal of a PH-domain. 
10 - Yeast protein BUD2 (or CLA2) has a C2-domain in the central region. 

- Yeast protein RSP5 and human protein NEDD-4, both proteins also contain WW domains (see <PDOC50020>). 

- Perforin (see <PDOC00251>) has a C2 domain at the C-terminus. It Is the only extracellular protein known to 
contain a C2 domain. 

- Yeast hypothetcal protein YML072C has a C2 domain. 

IS - Yeast hypothetkal protein YNL087W has three C2 domains. 

- Caenorhabditis elegans hypothetk:al protein F37 A4.7 has two C2 domains. 

The C2 domain is thought to be involved in cateium-dependent phospholipid binding [5]. Since domains related to 
the C2 domain are also found in proteins that do not bind calcium, other putative functions for the C2 domain like 
e.g. binding to inositol-1.3,4.5-tetraphosphate have been suggested [6]. Recently, the 3D stmcture of the first 02 
20 domain of synaptotagmin has been reported [7], the domain forms an eight-stranded beta sandwch. The signature 

pattern that has been devek^ped for the 02 donrtain is located In a conserved part of that domain, the connecting 
kxp between beta strands 2 and 3. A profile has been developed for the 02 domain that covers the total domain. 

- Consensus pattern: [A0G]-x(2)-L-x(2,3)-D-x(1 .2)-[NGSTLIF]-[GTMR]-x-[STAP]-D-[PAHFY] 

^ - Hole: this documentatbn entry is linked to both a signature pattern and a profile. As the profile is much more 
sensitive than the pattern, you shouki use it if you have access to the necessary software tools to do so. 

[0275] [1]Medline: 96367095 Extending the 02 domain family: C2s In PKOs delta, epslkxi. eta and theta, phosphol- 
Ipases. GAPs and perforin. Ponting OP. Parker PJ; Protein Scl 1996;5:162-166. 

30 

1 1] Azzi A., Boscoboinik D., Hensey 0. Eur. J. Biochem. 208:547-557(1992). 
i 2] Stabel S. Semin. Cancer Biol. 5:277-284(1994). 

[ 3] Brose N., Hofmann K.O.. Hata Y, Suedhof TO. J. Biol. Chem. 270:25273-25280(1995). 
( 4J Sossin W.S., Schwartz J.H. Trends Bkxihem. ScL 18:207-208(1993). 
35 [ 5] Davletov B.A., Suedhof TO. J. Biol. Chem. 268:26386-26390(1993). 

[ 6] Fukuda M., Aruga J., Niinobe M., Aimoto S.. Mikoshiba K. J. Bbl. Chem. 269:29206-29211(1994). 
1 6] Sutton R.B.. Davletov B.A.. Berghuis A.M.. Suedhof TO.. Sprang S.R. Cell 80:929-938(1995). 

[0276] 71 . CAP (CAP protein) Number of members: 11 
40 In budding and fissbn yeasts the CAP protein is a bifunctkxial protein whose N-terminal domain binds to adenylyl 
cyclase, thereby enabling that enzyme to be activated by upstream regulatory signals, such as Ras. The f unctkxi of 
the C-terminal domain is less clear, but it is required for normal cellular nrxjrphology and growth control [1]. CAP Is 
corisenred in higher eukaryotk: organisms where its functk)n is not yet clear [2]. 

Structurally. CAP Is a protein ol 474 to 551 residues whk:h consist of two domains separated by a proline-rich hinge. 
45 Two signature patterns, one corresponding to a consenred regkxi in the N-terminal extremity and the other to a C- 
terminal regbn have been devebped. 

- Consensus pattem: [UVM](2)-x-R-L4DE]-x(4)-R-L-E 

- Consensus pattem: D-(LIVIWFY]-x-E-x-(PAhx-P-E-0[LIVMFY]-K 

so 

1 1] Kawamukai 1^.. Gerst J., Field J.. Riggs M., Rodgers L. Wigler M.. Young D. Mol Biol. Cell 3:167-180(1992). 
[ 2] Yu G., Swiston J.. Young D. J. Cell ScL 107:1671-1678(1994). 

[0277] 72. CAP_GLY(CAP-Gly domain) 
ss CAP stands for cytoskeleton-associated proteins. Swiss:P39937 may be a member but has not been included. It has 
a weak rratch to the family between reskiues 22-67. Number of members: 24 

[1 JMedline: 93242656. Sequence homotogies between four cytoskeleton-associated proteins. Riehemann K, Sorg 0; 
Trends Bk)chem Scl 1993;18:82-83. 
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[0278] It has been shown [1 ] that some cytoskeleton-associated proteins (CAP) share the presence of a conserved, 
glycine-rich domain of about 42 residues, called here CAP-Gly. Proteins known to contain this domain are listed bek>w. 

- Restin (also known as cytoplasmic linker protein-l 70 or CLIP-170), a 160 Kd protein associated with intermediate 
5 filaments and that links endocytic vescles to microtubules. Restin contains two copies of the CAP-Gly domain. 

- Vertebrate dynactin (150 Kd dynein-associated polypeptide; DAP) and Drosophila glued, a major component of 
activator I, a 20S polypeptkie complex that stimulates dyneinnmediated vesicle transport. 

- Yeast protein Bl K1 which seems to be required for the formatton or stabilizatbn of mk:rotubules during mitosis and 
for spindle pole body f usk)n during conjugatbn. 

10 - Yeast protein NIP100 (NIP80). 

- Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein aipll and Caenorhabditis elegans hypothet- 
ical protein F53F4.a These proteins contain a N-terminal ubiquitin domain (see <PDOC00271 >) and a C-terminal 
CAP-Gly donnain. 

- Caenorhabditis elegans hypothetk:al protein MOI A8.2. 
IS . Yeast hypothetk:al protein YNL14ac. 

Structurally, these proteins are made of three distinct parts: an IM-terminal secXksn that is most probably gbbular and 
contains the CAP-Gly domain, a large central region predicted to be in an alpha-helical colled-coil conformation and, 
finally, a short C-terminal gtobular domain. The signature for the CAP-Gly domain corresponds to the first 32 residues 
20 of the domain and includes five of the six consen/ed glycines. 

■ Consensus pattern: G-x(8.10)-[FYV\n-x-G-lLIVM]-x-[LIVMFY>x(4)-G-K-[NH]-x-G-[STAR]-x(2)-G-x(2)^ 

[ 1] Riehemann K.. Sorg C. Trends Bk«hem. Sci. 18:82^3(1 ggS). 
2S [02791 73. (CBD1) 

Cellubse-blnding domain, fungal type 

The microbial degradation of celluk)6e and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4), cellobk)hydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 3.2.1.8) [1]. 

[0280] Structurally, cellulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding 
30 domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. 

[0281] The CBD of a number of fungal cellulases has been shown to consist of 36 amino ackJ resklues. Enzymes 
known to contain such a domain are: 

- Endoglucanase I (gene egll ) from Trk:hoderma reesei. 
35 - Endoglucanase II (gene egl2) from Trichoderma reesei. 

Endoglucanase V (gene egl5) from Trk:hoderma reesei. 

Exocelbbkihydrolase I (gene CBHI) from Humicola grisea, Neurospora crassa, Phanerochaete chrysosporiurr 
Trbhoderma ree<ei, and Trbhoderma viride. 

- Exocelk>biohydrolase M (gene CBHIl) from Trichoderma reesei. 
^ - Exocetoblohydrolase 3 (gene cel3) from Agarfcus bisporus 

Endoglucanases B, C2. F and K from Fusarium oxysporum. 

[0282] The CBD domain is found either at the N-terminal (Cbh-ll or egl2) or at the C-terminal extremity (Cbh-I, egll 
or eg15) of these enzynries. As it Is shown in the folkiwing schemata representation, there are four consented cysteines 
^ in this type of CBD domain, all involved in disulfkie bonds. 



+ + 

I +-,..-1 + 

I I I I 
xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx 

*C: conserved cysteine involved in a disulfide bond, 
position of the pattern. 

[0283] Such a domain has also been found in a putative polysaccharkJe binding protein from the red alga, Porphyra 
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purpurea [2]. Structurally, this protein consists of four tandem repeats of the CBD domain. 

[0284] Consensus pattemC-G-G-x(4.7)-G-x(3)-C-x(5)-C-x(3.5HNHG]-x-[FYWM]- x(2)-Q-C [The four C's are In- 
volved in disulfide bonds] Sequences known to belong to this class detected by the pattern ALL. 

5 [ 1] Gilkes N R., Henrissat B., Kilbum D.G.. Miller R.C. Jr.. Vtorren R.A.J. Microbiol. Rev. 55:303-315(1991). 

[ 2] Liu Q., der Meer J.R. Reith M.E. 

[0285] 74. CBS donrtain. 3D Structure found as a subdomain in TIM barrel of inosine-. CBS domain web page. CBS 
domains are small intracellular modules mostly found in 2 or four copies within a protein. CBS domains are found in 
10 cystathionine-beta-synthase (CBS) where mutations lead to homocystinuria. Two CBS domains are found in inosine- 
monophosphate dehydrogenase from all species, however the CBS domains are not needed for activity. Two CBS 
domains are found in Intracellular loops of several chloride channels. Mutations in this domain of Swiss:P35520 lead 
to honrxx^ystinuria. Number of members: 414 

IS [1 ]Medrme: 971 72695 The structure of a domain common to archaebacteria and the homocystinuria disease pro- 

tein. Bateman A; Trends Biochem Sci 1997;22:12-13. 

[2]Medline: 96279836 Structure and mechanism of inoslne monophosphate dehydrogenase in complex with the 
immunosuppressant rriycophenollc-acid. Sintchak MD, Fleming MA, Futer O, Raybuck SA, Chambers SP. Caron 
PR. Murcko MA. Wilson KP; Cell 1996;85:921-930. 
20 Discovery of CBS domain. 

[3]Medline: 97259972 CBS domains in CIC chloride channels imptk:ated in myotonia and nephrolithiasis (kkiney 
stones). Ponting CP; J Mol Med 1997;75:160-163. 

[0286] 75. CDP-OH_Pjransf (CDP-ateohol phosphatidyltransferase) 
25 All of these members have the ability to catalyze the displacement of CMP from a CDP-alcohol by a second akxjhol 
with formatkxt of a phosphodiester bond and concomitant breaking of a phosphorkie anhydride bond. Number of mem- 
bers: 32 

A number of phosphatidyltransferases, whfch are all involved in phospholipKl bk>synthesls and that share the property 
of catalyzing the displacement of CMP from a CDP-alcohol by a second ateohol with formation of a phosphodiester 
30 bond and concomitant breaking of a phosphorkie anhydride bond share a conserved sequence regton (1,2]. These 
enzymes are: 

- Ethanolaminephosphotransf erase (EC 2.7.8. 1 ) from yeast (gene EPTI ). 

- DIacylglycerot cholinephosphotransf erase (EC 2.7.8.2) from yeast (gene CPT1 ). 

3S - PhosphatkJyIglycerophosphate synthase (EC 2.7.8.5) (CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatkJyl- 
transf erase) from bacteria (gene pgsA). 

- Phosphatidylserine synthase (EC 2.7.8.8) (CDP-dlacylglycerol-serine O-phosphatkJyItransferase) from yeast 
(gene CHOI) and from Bacillus subtilis (gene pssA). 

- PhosphatkJyIinositol synthase (EC 2.7.8.11) (CDP-diacylglyceroMnositol 3-phosphatkJyltransf erase) from yeast 
40 (gene PIS). 

These enzymes are proteins of from 200 to 400 amino ackJ residues. The conserved regkxi contains three aspartb 
acki resklues and is kx:ated in the N-terminal sectbn of the sequences. 

45 . Consensus pattern: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D 

[IjMedline: 97075020 Two-dimensional 1H-NMR of transmembrane peptides from Escherichia coll phosphatidylglyc- 
erophosphate synthase in mcelles. Morein S, Trouard TP, Hauksson JB, Rilfors L. An^kJson G, Lindblom G; Eur J 
Bkxhem 1996;241:48&497. 

so 

1 1] NIkawa J.-i., Kodaki T, YanDashita S. 
J. Bk>l. Chem. 262:4876-4881(1987). 
[ 2] Hjelmstad R.H.. Bell R.M. 
J. Bk)l. Chem. 266:5094-5134(1991). 

55 

[0287] 76. CHOD (Cholesterol oxklase) Members of the GMC oxktoreductase family Number of members: 3 
[0288] [1]MedHne: 94032271 . Crystal stnicture of cholesterol oxkiase complexed with a steroid substrate: Implica- 
ttons for flavin adenine dinucleotide dependent alcohol oxkiases. U J, Vrielink A. Brck P, Btow DM; Biochemistry 1 993; 
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32:11507-11515. 

[0289] The following FAD flavoproteins oxidoreductases have been found [1,2] to be evolutionary related. These 
enzymes, which are called 'GMC oxidoreductases'. are feted bebw. 

5 - Glucose oxidase (EC 1.1.3.4) (GOX) from Aspergillus niger. Reaction catalyzed: glucose + oxygen -> della-lu- 
conolactone + hydrogen peroxide. 

- Methanol oxidase (EC 1.1.3.13) (MOX) from fungi. Reaction catalyzed: methanol -i- oxygen -> acetaldehyde + 
hydrogen peroxide, 

- Choline dehydrogenase (EC 1.1.99.1) (CHD) from bacteria. Reaction catalyzed: choline + unknown acceptor -> 
10 betalne acetaldehyde + reduced acceptor. 

- Glucose dehydrogenase (GLD) (EC 1.1 .99.10) from Drosophila. Reaction catalyzed: glucose + unknown acceptor 
-> delta-gluconolactone + reduced acceptor 

- Cholesterol oxkJase (CHOD) (EC 1 . 1 . 3.6) from Brevibacterium sterolicum and Streptonnyces strain SA-COO. Re- 
actkxi catalyzed: cholesterol + oxygen -> cholest-4-en-3-one + hydrogen peroxide. 

IS - AlkJ [3], an akx)hoI dehydrogenase from Pseudomonas oleovorans, which converts al^hatk: medium-chain-length 
alcohols into aldehydes. This family also includes a lyase: 

- (R)-mandetanitrile lyase (EC 4.1 .2.10) (hydroxynitrile lyase) from plants [4], an enzyme Involved in cyanogenis. 
the release of hydrogen cyanide from injured tissues. These enzymes are proteins of size ranging from 556 (CHD) 
to 664 (MOX) amino acid residues which share a number of regions of sequence similarities. One of these regions, 

20 located in the N-termi nal section, corresponds to the FAD ADP- binding domain. The tunctton of the other consen/ed 

domains is not yet known; two of these domains have been selected as signature pattems. The first one is kxated 
in the N-terminal section of these enzymes, about 50 residues after the ADP-binding domain, while the second 
one Is located in the central sectbn. 

2S . Consensus pattern: [GA]^RKN]-x-[LIV]-G(2)-IGST|(2)-x-[LIVM]-N-x(3)-[FYWA].x(2)-(PAG]-x(5)-[DNESH] 

- Consensus pattern: [GS]-[PSTA]-x(2)-[STl-P-x-(UVM](2)-x(2)-S-G-[LIVM]-G 

[ 1] Cavener D.R. J. Mol. Biol. 223:811-814(1992). 
[ 2] Henikoff S., Henikoff J.G. Genomics 19:97-107(1994). 
30 [ 3] van Beilen J.B., Eggink G., Enequist H., Bos a. Witholt B. Mol. Mrcrobiol. 6:3121-3136(1992). 

[ 4] Cheng I.P., Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

[0290] 77. CKS (Cyclin-dependent kinase regulatory subunit) Number of members: 11 . Cyclin-dependent kinases 
(CDK) are protein kinases whkih associate with cyclins to regulate eukaryotic cell cycle progressbn. The most well 

3S known CDK is p34-cdc2 (CDC28 in yeast) which is required for entry into S-phase and mitosis. CDK's bind to a regu- 
latory subunit which is essential for their btologteal function. This regulatory subunit is a small protein of 79 to 150 
residues. In yeast (gene CKS1) and in nssk)n yeast (gene suci) a single isoform is known, while mammals have two 
highly related isofonms. It has been shown [1] that these CDK regulatory subunits assemble as an hexamer which then 
acts as a hub for the oltgomerizatkxi of six CDK catalytc subunits. The sequence of CDK regulatory subunits are highly 

40 conserved therefore, the two most conserved regions have been used as signature pattems. 

- Consensus pattern: Y-S-x-[KR]-Y-x-[DE](2)-x-[FY]-E-Y-R-H-V-x-(LV]-[PT|-[KRP] 

- Consensus pattern: H-x-P-E-x-H-[l V]-L-L-F-(KR] 

4S [0291] [ 1] Parge H.E.. Atval A.S., Murtari DJ.. Reed S.I., Tainer J.A. Science 262:387-395(1993). 
[0292] 78. CKJLbeta (Casein kinase II regulatory subunit) 

Number of members: 16. Casein kinase II (CK-2) [1] is an ubquitous eukaryotk: serine/threonine protein kinase which 
is found both in the cytoplasm and the nucleus and whose substrates are numerous. It generally phosphorylates Ser 
or Thr at the N-terminal of stretch of ackJk: reskfues (see <PDOC00006>). CK-2 exists as an heterotetramer composed 
so of two catalytic subunits (alpha) and two regulatory subunits (beta). In most species there are two ctosely related 
isofonms of the catalytic subunit: alpha and alpha'. Some species, such as fungi and plants, express two forms of 
regulatory subunits: beta and beta'. The exact f unctkxi of the regulatory subunit Is not yet known. It is a highly consented 
protein of about 25 Kd that contains, in its central section, a cysteine-rich motif that coukJ be involved in binding a metal 
such as zinc [2]. This region has been used as a signature pattern. 

55 

- Consensus pattern: C-P-x-[LIVMY]-x-C-x(5)-ILI]-P-[LIVMC]-G-x(9)-V-[KR]-x(2)-C-P-x-C 
[ 1] Allende J.E., Allende C.C. FASEB J. 9:313-323(1995). 
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[ 2] Reed J.C., Bidwai A.P., Glover C.V.C. J. Biol. Chem. 269:18192-18200(1994). 
[0293] 79. CLP^protease (CIp protease) 

These proteins belong to family S14 in the classification of peptidases. 

5 

!• The CIp prc^ease has an active site catalytic triad. In E. cdi CIp protease, ser-111, hi$-136 and asp-185 form 
the catalytic triad. 

- !- Swiss: P48254 has lost all of these active site residues and is therefore inactive. 

- I - Swiss:P42379 contains two large insertions, Swiss:P42380 contains one large insertion. Nunnber of members: 38 

10 

[0294] The endopeptidase CIp (EC 3.4.21 .92) from Escherichia coli cleaves peptides in various proteins in a process 
that requires ATP hydrolysis [1 ,2]. CIp e a dimeric protein which consists of a proteolytic subunit (gene cIpP) and either 
of two related ATP-bindIng regulatory subunits (genes clpA and cIpX). CIpP is a serine protease which has a chyrTK>- 
trypsin-like activity. Its catalytic activity seems to be provided by a charge relay system similar to that of the trypsin 
IS family of serine proteases, but which evolved by independent convergent evolution. Proteases highly similar to CIpP 
have been found to be encoded in the genome of the chloroplast of plants and seem to be also present in other 
eukaryotes. The sequences around two of the residues involved in the catalytic triad (a serine and a histldine) are 
highly consented and can be used as signature patterns specific to that category of proteases. 

20 - Consensus pattern: T-x{2)-[U VMF|-<3-x-A-[SACJ-S-[MSA]-[PAG]-ISTA] [S is the active site residue] 

- Consensus pattern: FI-x(3)-[EAP]-x(3)4UVMFYT]-M-[LI VM]-H-Q-P [H is the active site residue] 

[1]Medline: 98050920. The structure of CIpP at 2.3 angstroms resolution suggests a model for ATP-dependent 
proteolysis. W&ng J, Hartling JA, Flanagan JM; Cell 1997:91:447-456. 
2S 1 1] Maurizi M.R., Clark W.P, Kim S.-H.. Gottesman S. J. Biol. Chem. 265:12546-12552(1990). 

[ 2] Gottesman S.. Maurizi M.R. Mcrobiol. Rev. 56:592-621(1992). 
[ 3] Rawlings N.D.. Barrett A.J. Meth. Enzymoi. 244:19-61(1994). 

[0295] 80. CNG_membrane (Transmembrane region cydic Nucleotide Gated Channel) 
30 [ijMedline: 94224763. Cycle nucleotide-gated channels: an expanding new family of ion channels. Yau KW; Proc Natl 
AcadSci USA 1994;91:3481-3483. 

This family is found to the N4enminus of the cNMP.binding. Number of members: 56. Proteins that bind cyclic nucle- 
otides (cAMP or cGMP) share a structural domain of about 120 residues [1-3]. The best studied of these proteins is 
the prokaryotic catabollte gene activator (also known as the cAMP receptor protein) (gene crp) where such a domain 
3S is known to be composed of three atpha-helices and a distinctive eight-stranded, antiparallet beta-barrel structure. 
Such a domain is known to exist In the folk>wing proteris: 

Prokaryotic catabollte gene activator protein (CAP). 

- cAMP- and cGMP-<tependent protein kinases (cAPK and cGPK). Both types of kinases contains two tandem copies 
40 of the cyclic nucleolkJe-binding domain. The cAPICs are composed of two different subunits: a catalytic chain and 

a regulatory chain whk:h contains both copies of the domain. The cGPK's are single chain enzymes that include 
the two copies of the domain In their N-terminal section. The nucleotide specificity of cAPK and cGPK is due to 
an amino acid in the consented regksn of beta-barrel 7: a threonine that is invariant in cGPK is an alanine in most 
cAPK. 

45 - Vertebrate cyclic nucleotlde-gated ion-channels. JyNO such catbns channels have been fully characterized. One 
is found In rod cells where it plays a role in visual signal transductkxi. It specifically binds to cGMP leading to an 
opening of the channel and thereby causing a depolarizatkxi of rod photoreceptors. In olfactory epithelium a similar, 
cAMP-binding. channel plays a role In odorant signal transduction. There are six Invariant amino acids in this 
domain, three of whfch are glycine residues that are thought to be essential for maintenance of the structural 

so integrity of the beta-barrel. Two signature patterns have been developed for this domain. The first pattern is located 

within beta-barrels and 3 and contains the first two consented Gly The second pattern is kx:ated within beta- 
barrels 6 and 7 and contains the third conserved Gly as well as the three other invariant reskJues. 

- Consensus pattern: lUVM]-[VICl-x(2)-G-[DENQTA]-x-[GAC]-x(2)-[LIVMFY](4)-x{2)-G 

ss - Consensus pattern: [UVMFl-G-E-x-[GAS]-ILIVM]-x(5.11)-R-[STAQ]-A-x-[UVMA]-x-[STACV] 

[ 1] Weber I T.. Shabb J.B.. Corfoin J.D. Biochemistry 28:6122-6127(1989). 
[ 2] Kaupp U.B. Trends NeuroscL 14:150-157(1991). 
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[ 3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 

[0296] 81 . COX1 0_ctaB_cyoE (Cytochrome c oxidase assembly factor) 
11] 

s Medline: 95191390 

Biosynthesis and functional role of haem O and haem A 
Mogi T, Saiki K, Anraku Y; 

Mol Microbiot 1994;14:391>398. 
Cytochrome c oxkiase is a mufti subunit enzyme. The complexity of this enzyme requires assistance in building the 
10 complex. 

This is can^ied out by the Cytochrome c oxidase assembly factor. 
Number of members: 31 

[0297] Cytochrome c oxidase is an oligonnerk; enzymatic complex which seems to require the akl of a number of 
proteins that either act as chaperonins to help the subunlts of the enzyme to fold correctly, or assist in the assembly 
IS of the metal centers [1]. One of these subunlts is known as COX1 0 In yeast and as ctaB [2] in aerobic prokaryotes. It 
is evolutkxiary related to cyoE protein from the Escherichia coll cytochrome O terminal oxkiase complex. 
[0298] These proteins probably contain [3] seven transmembrane segments. The most conserved reg»n Is kx:ated 
In a loop between the second and third of these segments and has been selected as a signature pattem. 

20 . Consensus pattem: [ED]-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G 

[ 1] Nobrega M.P.. Nobrega RG., Tzagotoff A. 

J. Bk>l. Chem. 265:14220-14226(1990). 
[ 2] Cao J.» Hosier J., Shapleigh J.. Revzin A.. Ferguson-Miller S. 
25 J. Bk>l. Chem. 267:24273-24278(1 992). 

[ 3] Chepuri V., Gennis R.B. 

J. Bk>l. Chem. 265:12978-12986(1990). 

[0299] 82. COX3 (Cytochrome c oxidase subunit III) 
30 This family corresporKis to chains c and p. 
[1] 

Medline: 96216288 
The whole structure of the 13-subunit oxkJized cytochrome c 
oxidase at 2.8 A. 

35 Tsukihara T, Aoyama H, Yamashita E, Tomizaki T. YamaguchI H, 
Shinzawa-ltoh K, Nakashima R. Yaono R, Yoshikawa S; 
Science 1996;272:1136-1144. 
Number dl members: 224 

[030Cq 83. COX5B (Cytochrome c oxkiase subunit Vb) 

40 [1] 

Medline: 96216286 

The whole structure of the 13-subunit oxklized cytochrome c oxkiase at 2.8 A. 

Tsukihara T, Aoyama H. Yamashita E, Tomizaki T. Yamaguchi H. Shinzawa-ltoh K. Nakashima R. Yaono R, Yoshikawa 

S; 

45 Science 1 996;272: 1 1 36-1 1 44. 

This family consists of chains F and S 
Number of members: 10 

[0301] Cytochrome c oxkiase (EC 1.9.3.1) [1] is an oligomeric enzymatk: complex which is a component of the 
respiratory chain complex and is involved in the transfer of electrons from cytochrome c to oxygen. In eukaryotes this 

so enzyme complex is kx:ated in the mitochondrial Inner membrane; in aerobe prokaryotes It is found in the plasma 
membrane. In additkxi to the three large subunlts that form the catalytb center of the enzyme complex there are, in 
eukaryotes, a variable number of small polypeptWic subunils. One of these subunits whch is known as Vb in mamnfials, 
V in slime mokl and IV in yeast, binds a zinc atom. The sequence of subunit Vb is well conserved and includes three 
consenfed cysteines that are thought to coordnnate the zinc kxi [2]. Two of these cysteines are clustered in the C- 

55 terminal sectkxi of the subunit; this regkxi has been selected as a signature pattem. 

- Consensus pattem: [LIVM](2)-[FYW|-x(10)-C-x(2)-C-G-x(2)-(FY]-K-L [The two Cs probably bind zinc] 
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[ 1] CapaWi RA. Matatesta R, Darley-Usmar V.M. Blochim. Biophys. Acta 726:135-148(1983). 

[ 2] Rizzuto R., Sandona D.. Brini M.. Capaldi RA, Bisson R. Biochim. Biophys. Acta 1129:100-104(1991). 

[0302] 84. COesterase (Cartx)xylesterases) 
5 Cholinesterase pages 

The prints entry is specific to acetylcholinesterase 
Number of members: 273 

[0303] Higher eukaryotes have many distinct esterases. Among the different types are those which act on carboxylic 
esters (EC 3.1 .1 .-). Carboxyl-esterases have been classified into three categories (A. B and C) on the basis of differ- 
to ential patterns of inhibition by organophosphates. The sequence of a number of type-B carlx)xylesterases indicates 
[1,2.3J that the majority are evolutionary related. This family currently consists of the following proteins: 

- Acetylcholinesterase (EC 3. 1 . 1 .7) (AChE) [El ] from vertebrates and from Drosophila. 

- Mammalian cholinesterase II (butyryl cholinesterase) (EC 3.1 .1 .8). Acetylcholinesterase and cholinesterase II are 
closely related er^mes that hydrolyze choline esters [4]. 
Mammalian liver microsomal carboxylesterases (EC 3. 1 . 1 .1 ). 

Drosophila esterase 6, produced in the anterior ejaculatory duct of the male insect reproductive system where it 
plays an important rote in its reproductive biology 
Drosophila esterase P. 

Culex pipiens (nrK)squito) esterases B1 and B2. 
Myzus persicae (peach-potato aphid) esterases E4 and FE4. 

- Mammalian bile-salt-activated lipase (BAL) [5], a multifunctional lipase which catalyzes fat and vitamin absorption. 
It is activated by bile salts In infant intestine where it helps to digest milk fats. 
Insect juvenile hormone esterase (JH esterase) (EC 3. 1 ,1 .59). 
Lipases (EC 3. 1 . 1 .3) from the fungi Geotrichum candidum and Candida rugosa. 
Caenorhabditis gut esterase (gene ges-1 ). 

- Duck fatty acyl-CoA hydrolase, medium chain (EC 3.1 .2. 1 4). an enzyme that may be associated with peroxisome 
proliferation and may play a role in the production of 3-hydroxy fatty acid diester pheromones. 
Membrane enclosed crystal proteins from slime mold. These proteins are, most probably esterases; the vesicles 
where they are found have therefore been termed esterosomes. 

[0304] So far two bacterial proteins have been found to bekxig to this family: 

Phenmedipham hydrolase (pheny Icarbamate hydrolase), an Arthrobacter oxidans plasmid-encoded enzyme (gene 
pcd) that degrades the phenylcartiamate herbnides phenmedipham and desmedipham by hydrolyzing their central 
cark>arnate linkages. 

Para-nitrobenzyl esterase from Bacillus subtilis (gene pnbA). 

[0305] The fdbwing proteins, while having k>st their catalytk: activity, contain a domain evolutkxiary related to that 
40 of cartxsxytesterases type-B: 

- Thyrogk)bulin (TG), a glycoprotein specific to the thyroid gland. whk:h is the precursor of the kxiinated thyrokJ 
hormones thyroxine fT 4) and trikxto thyronine (T3). 

- Drosophila protein neuractin (gene nrt) whk^h may mediate or modulate cell adhesbn between embryonk: cells 
^ during development. 

- Drosophila protein glutactin (gene gft), whose function is not known. 

[030q As is the case for lipases and serine proteases, the catalytic apparatus of esterases involves three reskJues 
(catalytk: triad): a serine, a glutamate or aspartate and a histidine. The sequence around the active site serine is well 
so consen/ed and can be used as a signature pattem. A consented region located In the N4erminal sectkxi containing a 
cysteine involved In a disulfide bond has been selected as a second signature pattem. 

- Consensus pattem: F4GR]-G-x{4)-[U VMl-x-{LI Vl-x-G-x-S-[STAG]-G[S is the active site residue] 

- Consensus pattern: [ED]-D-C-L-[YT]-[LIV)-(DNS]-[LIVl-[LIVFYWl-x-[PQR] [C is Involved in a disulfide bond] 

55 

[ 1] Myers M., Richmond R.C., Oakeshott J.G. Mol. Bk)l. Evol. 5:113-119(1988). 

[ 2] Krejci E., Duval N., Chatonnet A,, Vincens R. Massoulie J. Proc. Natl. Acad. Sci. U.S.A. 88:6647-6651(1991). 
[ 3] Cygler M., Schrag J.D., Sussman J.L. Harel M., Silman L Gentry M.K., Doctor B.P. Protein Sci. 2:366-382 
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(1993). 

[ 4] Lockridge O. BioEssays 9:125-128(1988). 

[ 5] Wang C.-S., Hartsuck J.A. Biochim. Biof^ys. Acta 1166:1-19(1993). 

5 [0307] 85. CPSase_L_chain (Carbamoyl-phosphate synthase (CPSase)) 
[11 

Medline: 94347758 

Three-dtmenslonal structure of the btotin carboxylase subunit. of acetyl-CoA carboxylase. 
V\fakJrop GL. Payment I. HoWen HM; 
10 Biochemistry 1994;33:10249-10256. 
[1] 

Medline: 90285162 

Mammalian carbamyl phosphate synthetase (CPS). DNA sequence and evolutkxi of the CPS domain of the Syrian 
hamster multifunctional protein CAD. 
IS Simmer JP, Kelly RE, RInker AG Jr. Scully JL. Evans DR; 
Biol Chem 1990;265:10395-10402. 
Carbamoyl-phosphate synthase catalyzes the ATP-<iependent synthesis of carbamyl-phosphate from glutamine or 
ammonia and bicarbonate. This important enzyme initiates both the urea cycle and the biosynthesis of arginine and/ 
or pyrimidines [2]. 

20 The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes Is a heterodimer of a small and large chain. The 
small chain promotes the hydrolysis of glutamine to amnrK>nia, whnh Is used by the large chain to synthesize cart)amoyl 
phosphate. See CPSase_smjchain. 

The small chain has a GATase domain in the carboxyl terminus. 
See GATase. 

25 Number of members: 181 

[0308] Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate 
from glutamine (EC 6.3.5.5) or ammonia (EC 6.3.4.16) and bfcarbonate [1]. This important enzyme initiates both the 
urea cycle and the bkisynthesis of arginine and pyrimidines. 

[0309] Glutamine-dependent CPSase (CPSase II) is involved in the bwsynthesis of pyrimidines and purines. In bac- 
30 teria such as Escherichia coli, a single enzyme is involved in both biosynthetic pathways while other bacteria have 
separate enzymes. The bacterial enzymes are fomned of two subunits. A small chain (gene carA) that provkJes 
glutamine amkJolransferase activity (GATase) necessary for removal of the ammonia group from glutamine, and a 
large chain (gene carB) that provides CPSase activity. Such a structure is also present in fungi for arginine biosynthesis 
(genes CPA1 and CPA2). In most eukaryotes, the first three steps of pyrimkJine biosynthesis are catalyzed by a large 
35 multifunctbnal enzyme - called URA2 in yeast, rudimentary In Drosophila and CAD in mammals [2]. The CPSase 
domain is located between an N-terminal GATase domain and the C-terminal part which encompass the dihydroorotase 
and aspartate transcart>amylase activities. 

[0310] Ammonia-dependent CPSase (CPSase I) Is involved In the urea cycle In ureolytic vertebrates; it is a mono- 
f unct»nal protein kx»ted in the mitochondrial matrix. 
^ [031 1] The CPSase domain is typcally 1 20 Kd in size and has arisen from the duplk^atfon of an ancestral subdomain 
of about 500 amino acids. Each subdomain independently binds to ATP and it is suggested that the two homologous 
halves act separately, one to catalyze the phosphorylatkxi of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyl phosphate. 

[0312] The CPSase subdomain Is also present in a single copy in the bk>tin-dependent enzymes acetyl-CoA car- 
45 boxylase (EC 6.4.1 .2) (ACC), propionyl-CoA carboxylase (EC 6.4.1 .3) (PCCase). pyruvate carboxylase (EC 6.4.1 .1 ) 
(PC) and urea carboxylase (EC 6.3.4.6). 

[031 3] Two consented regions whk:h are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdomain. 

so . Consensus pattern: [FYV14PS]-[UVMC]-[LI\flyAAHLIVM]-[KR]-[PSA]-[STA]-x(3)-[SG]-G^ 
- Consensus pattern: [LIVMFHLIMN]-E-tLIVMCA]-N-[PATLIVMHKR]-[LIVMSTAC] 

[ 1] Simmer J.P, Kelly R.E.. Rinker AG. Jr., Scully J.L. Evans D.R. 
J. Bk>l. Chem. 265:10395-10402(1990). 
ss 1 2] Davidson J.N.. Chen K.C., Jamison R.S.. Musmanno LA.. Kern C.B. 

BioEssays 15:157-164(1993). 

[0314] 86. CPSase_sm_chain (Carbamoyl-phosphate synthase small chain. CPSase domain) 
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[1] 

Medline: 90285162 

Mammalian carbamyi phosphate synthetase (CPS). DNA sequence and evolution of the CPS domain of the Syrian 
hamster multifunctional protein CAD. 
s Simmer JR Kelly RE, Rinker AG Jr, Scully JL, Evans DR; 
Biol Chem 1990;265:10395-10402. 
The carbamoyl-phosphate synthase domain is in the amino temninus of protein. 
Carbamoyl-phosphate synthase catalyzes the ATP-dependent synthesis of cart)afTTyl-phosphate from glutamine or 
ammonia and bicarlx)nate. This Important enzyme initiates both the urea cycle and the biosynthesis of arglnine and/ 
10 or pyrimidines [1]. 

The carbamoyl-phosphate synthase (CPS) enzyme in prokaryotes is a heterodimer of a small and large chain. The 
small chain promotes the hydrolysis of glutamtne to ammonia. whk:h is used by the large chain to synthesize carbamoyl 
phosphate. See CPSase_L_chaln. 

The small chain has a GATase domain in the cartx)xyl terminus. 
IS See GATase. 

Number of members: 46 

[0315] Carbamoyl-phosphate synthase (CPSase) catalyzes the ATP-dependent synthesis of carbamyl-phosphate 
from glutamine (EC 6.3.5.5) or ammonia (EC 6,3.4.16) and bicarbonate [1], This important enzyme initiates both the 
urea cycle and the biosynthesis of arglnine and pyrimidines. 

20 [0316] Glutamlne-dependent CPSase (CPSase II) Is Involved in the bk>synthesls of pyrimidines and purines. In bac- 
teria such as Escherichia coli. a single enzyme is involved in both biosynthetic pathways while other bacteria have 
separate enzymes. The bacterial enzymes are fomned of two subunrts. A small chain (gene carA) that provkies 
glutamine amktotransf erase activity (GATase) necessary for removal of the ammonia group from glutamine, and a 
large chain (gene carB) that provides CPSase activity. Such a structure is also present in fungi for arglnine biosynthesis 

2S (genes CPA1 and CPA2). In most eukaryotes, the first three steps of pyrlmldlne biosynthesis are catalyzed by a large 
multifunctwnal enzyme - called URA2 in yeast, rudimentary in Drosophila and CAD in mammals [2]. The CPSase 
domain is located between an N4erminal GATase domain and the C-terminal part whbh encompass the dihydroorotase 
and aspartate transcarbamylase activities. 

[0317] Ammonia-dependent CPSase (CPSase I) is involved In the urea cycle In ureolytic vertebrates; it Is a mono- 

30 functional protein located in the mitochondrial matrix. 

[0318] The CPSase domain is typically 120 Kd in size and has arisen from the duplication of an ancestral subdomain 
of about 500 amino acUs. Each subdomain independently binds to ATP and It is suggested that the two homok>gous 
halves act separately, one to catalyze the phosphorylation of bicarbonate to carboxy phosphate and the other that of 
carbamate to carbamyi phosphate. 

3S [0319] The CPSase subdonr«in is also present in a single copy in the blotin-dependent enzymes acetyl-CoA car- 
boxylase (EC 6.4.1.2) (ACC), propionyl-CoA carboxylase (EC 6.4.1.3) (PCCase). pyruvate carboxylase (EC 6.4.1.1) 
(PC) and urea carboxylase (EC 6.3.4.6). 

[0320] Two consented regk)ns whch are probably important for binding ATP and/or catalytic activity have been se- 
lected as signatures for the subdonnain. 

40 

- Consensus paUem: [FYyi-[PS]- [LIVMCHLIVMA)-[LIVM]-IKR]-[PSA]-[STA]-x(3)-[SG]-G-x-[AG] 

- Consensus pattern: [UVMFHUMN]-E4U VMCA]-N-IPATLIVMJ-IKR]-[LIVMSTAC] 

[ 1] Simmer J.R. Kelly R.E., Rinker A.G. Jr.. Scully J.L, Evans D.R. J. Biol. Chem. 265:10395-10402(1990). 
45 [ 2] Davkteon J.N.. Chen K.C., Jamison R.S.. Musmanno LA.. Kem C.B. BfoEssays 15:157-164(1993). 

[0321] 87. CARL_TRIO(CRAL/TRIO domain) 
[11 

Medline: 98121119 

SO Crystal structure of the Saccharomyces cerevisiae phosphatkiylinositokransfer protein. 
Sha B, Phillips SE, Bankaitis VA. Luo M; 
Nature 1998;391:506-510. 

The origirtal profile has been extended to include the carboxyl domain from the known structure of Sec14. Swiss: 
PI 0911 has not been included in the Pfam family because it does not appear to contain a complete structural domain. 
ss Number of members: 39 

[0322] 88. CSD (*Coki-shock'DNA-binding domain) 
[1] 

Medline: 94255482 
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Crystal structure of CspA, the major cold shock protein of Escherichia coli. 
Schindelln H, Jiang W, Inouye M, Hetnemann U; 
Proc Natl Acad Scl U S A 1994;91:5119-5123. 
Number of members: 121 

s [0323] A conserved domain of about 70 amino acids has been found in prokaryotic and eukaryotk: DNA-binding 
proteins [1 .2»3.E1 ]. This domain, whch is known as the *coki^hock domain'(CSD) is present in the proteins listed below. 

- Escherichia coli protein CS7.4 (gene cspA) which is induced in response to low temperature (coW-shock protein) 
and which binds to and stimulates the transcriptbn of the CCAAT-containing promoters of the HN-S protein and 

10 of gyrA. 

- Mammalian Y box binding protein 1 (YB1 ). A protein that binds to the CCAAT-containing Y box of mammalian HLA 
class tl genes. 

- Xenc^us Y box binding proteins -1 and -2 (Y1 and Y2). Proteins that bind to the CCAAT-containing Y box of 
Xenopus hsp70 genes. 

15 - Xenopus B box binding protein (YB3). YB3 binds the B box promoter element of genes transcribed by RNA polymer- 
ase III. 

- Enhancer factor I subunit A (EFI-A) (dbpB). A protein that also bind to CCAAT-motif in vark>us gene promoters. 

- DbpA, a Human DNA-birKling protein of unknown specificity. 
Bacillus subtllls coU-shock proteins cspB and cspC. 

20 - Streptomyces clavuligerus protein SC 7.0. 

Escherichia coli proteins cspB. cspC. cspD, cspE and cspF. 

- Unr, a mammalian gene encoded upstream of the N-ras gene. Unr contains nine repeats that are similar to the 
CSD domain. The functkxi of Unr is not yet known but It coukJ be a multivalent DNA-binding protein. 

2S [0324] As a signature pattem for the CSD domain, its most consented region which is located in its N-terminal sectkm 
has been selected. It must be noted that the 

beginning of this region is highly similar [4] to the RNP-1 RNA-binding nrK>lif . 

- Consensus pattem: IFY]-G-F-l-x(6.7)-[DER]-[LIVM]-F-x-H-x-[STKR]-x-[LI VMFY] 

30 

1 1] Doniger J., Landsman D., Gkxida M.A., Wistow G. 

New Bk>l. 4:389-395(1992). 
[2] Wistow G. 

feature 344:823-824(1990). 
35 1 3] Jones P.G., Inouye M. 

Mol. Microbiol. 11:811-818(1994). 
[ 4] Landsman D. 

Nuciek: Acids Res. 20:2861-2864(1992). 

40 [0325] 89. CTF^NFI (CTF/NF-I fanruly) 
Numt)er of members: 45 

[0320] Nuclear factor I (NF-I) or CCAAT box-binding transcriptk)n factor (CTF) [1 ,2] (also known as TGGCA-binding 
proteins) are a family of vertebrate nuclear proteins which recognize and bind, as dimers, the palindromic DNA se- 
quence 5'-TGGCANNNTGCCA-3r. CTF/NF-I binding sites are present in viral and cellular promoters and in the origin 

45 of DNA replrcatkx) of Adenovirus type 2. 

[0327] The CTF/NF-I proteins were first ktentified as nuclear factor I. a collection of proteins that activate the repli- 
cation of several Adenovims serotypes (together with NF-II and NF-III) [3]. The family of proteins was also kientified 
as the CTF transcriptbn factors, before the NFI and CTF families were found to be kientical [4]. The CTF/NF-I proteins 
are MivkJuatly capable of activating transcription and DNA replication. The CTF/NF-I family name has also been 

so dubbed as NFI, NF-I or NFI. 

[0328] In a given species, there are a large number of different CTF/NF-I proteins. The multiplcity of CTF/NF-I is 
known to be generated both by alternative spicing and by the occun-ence of tour different genes. The known forms of 
NF-I genes have t>een classified as: 

55 - The CTF-like factors subfamily (prototype form: CTF-1 ) [4J 

- The NFI-X proteins. 
The NFI-A proteins. 

- The NFI-B proteins. 
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[0329] So far, all CTF/NF-I family members appear to have similar transcription and replication activities. 
[03301 CTF/NF-1 proteins contains 400 to 600 amino acids. The N-terminal 200 amino-acid sequence, almost per- 
fectly conserved in all species and genes sequenced, mediates site-specific DNA recognition, protein dimerization and 
Adenovirus DNA replication. The C-terminal 100 amino acids contain the transcriptional activation domain. This acti- 
s vation domain is the target of gene expression regulatory pathways ellicited by growth factors and It interacts with basal 
transcription factors and with histone H3 [6]. 

[0331] A perfectly consented, highly charged 1 2 residue peptide located in the N-temninal part of CTF/NF-I has been 
selected as a specific signature for this family of proteins. 

10 . Consensus pattern: R-K-R-K-Y-F-K-K-H-E-K-R 

[ 1] Memiod N.. aNeill E.A., Kelly TJ., Tjian R. 

Cell 58:741-753(1989). 
[ 2] Rupp R.A.W., Kruse U., Mutthaup G., Goebel U., Beyreuther K., 
IS SippelA.E. 

Nucleic Acids Res. 18:2607-2616(1990). 
[ 3] Nagata K., Guggenheimer R.A., Enomoto T, Lichy J.H., Huniwtz J. 

Proc. Natl. Acad. Sci. U.S.A. 79:6438-6442(1982). 
[ 4] Santoro C. Mermod N.. Andrews RC, Tjian R. 
20 Nature 334:2118-2224(1988). 

[ 5] Gil G., Smith J.R, Goldstein J.L, Slaughter C.A., Orth K., Brown M.S., Osborne T.F. 

Proc. Natt. Acad. Sci. U.S. A 85:8963-8967(1988). 
[ 6] Alevizopoulos A., Dusserre Y, Tsai-Pflugfelder M., von der Weid T. Wahli W., Mermod N. 

Genes Dev 9:3051-3066(1995). 

25 

[0332] 90. Catsequestrin (Calsequestrin) 
Number of members: 13 

[0333] Calsequestrin is a moderate-affinity, high-capacity calcium-binding protein of cardiac and skeletal muscle [1 ], 
where it is located in the lumenal space of the sarcoplasmic reticulum terminal cistemae. Calsequestrin acts as a 
30 calcium buffer and plays an important role In the muscle excitation-contraction coupling. It is a highly acidic protein of 
about 400 amino acid residues that binds more than 40 moles of calcium per mole of protein. There are at least two 
different forms of calsequestrin: one which Is expressed in cardiac muscles and another In skeletal muscles. Both 
forms have highly similar sequences. 

[0334] Two signature sequences have been developed. The first corresponds to the N-termlnus of the mature protein, 
3S the second is located just in front of the C-terminus of the protein which is composed of a highly acklic tail of variable 
length. 

- Consensus pattern: [EQl-[DEl-G-L-[DN]-F-P-x-Y-D-G-x-D-R-V 

- Consensus pattern: [DE}-L-E-D-W-{LIVM]-E-D-V-L-x-G-x-[LI VM]-N-T-E-D-D-D 

40 

[0335] [ 1] Treves S., Vilsen B.. Chiozzi P. Andersen J.R. Zorzato R 

Biochem. J. 283:767-772(1992). 
[0336] 91 . Carb(»(yl_trans (Carboxyl transferase domain) 

[1] 

45 Medline: 93374821 

Primary structure of the morujmer of the 12S subunit of transcart)oxylase as deduced from DNA and characterization 
of the product expressed in Escherichia coll. 
Thomton CG, Kumar GK, Haase FC, Phillips NF. Woo SB, Park VM, 
Magner WJ, Shenoy BC. Wood HG, Samols D; J Bacteriol 1993;175:5301 -6308. 
so [2] 

Medline: 93358891 
Molecular evolution of bk>tin-dependent carboxylases. 
Toh H, Kondo H, Tanabe T; 
Eur J Biochem 1993;215:687-696. 
ss All of the members in this family are biotin dependent cartx)xylases. 

The cartxMcyl transferase domain carries out the folk>wing reactkxi; transcarboxylation from botin to an acceptor mol- 
ecule. There are two recognised types of cartxjxyl transferase. One of them uses acyl-CoA and the other uses 2-oxo 
acki as the acceptor molecule of carbon dk>xlde. 
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All of the members in this family utilise acyl^CoA as the acceptor molecule. 
Number of members: 47 

[0337] 92. ChaLstiLsynt (Chalcone and stilbene synthases) 
Number of members: 146 

s [0338] Chalcone synthases (CHS) (EC 2.3. 1 .74) and stilbene synthases (STS) (formerly known as resveratrol syn- 
thases) are related plant enzymes [1]. CHS is an important enzyme In flavanoid biosynthesis and STS a key enzyme 
in stilbene-type phyloalexin biosynthesis. Both enzymes catalyze the addition of three molecules of mabnyl-CoA to a 
starter CoA ester (a typical example is 4-coumaroyl-CoA). producing either a chateone (with CHS) or stilbene (with 
STS). 

10 [0339] These enzymes are proteins of about 390 amino^acid reskJues. A conserved cysteine residue, located in the 
central section of these proteins, has been shown [2] to be essential for the catalytic activity of both enzymes and 
probably represents the binding site for the 4-coumaryl-CoA group. The region around this active site reskiue is well 
conserved and can be used as a signature pattern. 

[0340] In additbn to the plant enzymes, this family also includes Bacillus subtilis bcsA. 

IS 

- Consensus pattern: R4LI VMFYSJ-xHLIVM]-x-{QHG]-x-G<:^FYNA]-[GA]<3^GA].[STAVJ-x^LI^^F]^F^^ [C is the 
active site resklue] 

[ 1] Schroeder J., Schroeder G. 
20 z. Naturforsch. 45C: 1^8(1 990). 

[ 2] Lanz T. Tropf S., Marner F.-J., Schroeder J., Schroeder G. 
J. Bk>l. Chem. 266:9971-9976(1991). 

[0341] 93. Chorismate_synt (Chorismate synthase) 

2S Number of members: 19 

[0342] Chorismate synthase (EC 4.6.1.4) catalyzes the last of the seven steps in the shikimate pathway which is 
used In prokaryotes. fungi and plants for the biosynthesis of aromatk: amino ackJs. It catalyzes the 1 .4-trans eliminatbn 
of the phosphate group from 5-enolpyruvylshikimate-3-phosphate (EPSP) to form chorismate whrch can then be used 
in phenylalanine, tyrosine or tryptophan biosynthesis. Chorismate synthase requires the presence of a reduced flavin 

30 mononucleotide (FMNH2 or FADH2) for its activity. 

[0343] Chorismate synthase from various sources shows [1 ,2] a high degree of sequence consen^ation. It is a protein 
of about 360 to 400 aminonackJ reskiues. Three signature patterns have been developed from consen/ed regions rich 
in basic residues (mostly arginines). The first is In the N-terminal sectkxi. the second is central and the third Is C4erminal. 

35 - Consensus pattern: G-E-S-H-[GC]-x(2)-[LIVM]-[GTV]-x-[LIVM](2)-[DE]-G-x-[PV] 

- Consensus pattern: [GE]-R-[SA](2)-[SAG]-R-(EyHST]-x(2)-[RH]-V-x(2)-G 

- Consensus pattern: R-[SH]-D4PSVHCSAy|-x(4)4GAI]-x4l VGSP]4LlVM]-x-E-[STAH]-[LIVM] 

[ 1] Schalter A., SchmkI J., Leibinger U., Amrhein N. 
40 J. Bk)l. Chem. 266:21 434-21 438(1 991 ). 

[ 2] Jones D.G.L. Fteusser U.. Braus G.H. 
Mol. Mfcrobfol. 5:2143-2152(1991). 

[0344] 94. Clat_adaptor_s (Clathrin adaptor complex snnall chain) 

45 Number of members: 21 

[0345] Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such as receptor mediated endocytosis. 
In addition to clathrin, the CCV are composed of a number of other components including oligomerk: complexes whk:h 
are known as adaptor or clathrin assembly proteins (AP) complexes [1 ]. The adaptor complexes are believed to interact 
with the cytoplasms tails of membrane proteins, leading to their setectbn and concentratfon. In mammals two type of 

so adaptor complexes are known: AP-1 which is associated with the Golgi complex and AP-2 which is associated with 
the plasma membrane. Both AP-1 and AP-2 are hetercrtetramers that consist of two large chains - the adaptins - 
(gamma and beta* in AP-1 ; alpha and beta in AP-2); a medium chain (AP47 in AP-1 ; AP50 in AP-2) and a small chain 
(API 9 in AP-1; API 7 in AP-2). 

[0346] The small chains of AP-1 and AP-2 are evolutkxiary related proteins of about 1 8 Kd. Homobgs of API 7 and 
SS API 9 have also been found in yeast (genes APSIA'APig and APS2A'AP17) [2,3,4]. API 7 and API 9 are also related 
to the zeta-chain [5] of coatomer (zeta-cop), a cytosolic protein complex that reversibly associates with Golgi mem- 
branes to form vessles that mediate bbsynthetk: protein transport from the endoplasmk: retrculum, via the Golgi up 
to the trans Golgi network. 
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[0347] A conserved region in the central section of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVM](2)-Y4KR]-x(4)-L-Y-F 

s 1 1] Pearse B.M., Robinson M.S. 

Annu. Rev. Cell Biol. 6:151-171(1990). 
[ 2] Kirchhausen T, Davis A.C.. Frucht S.. QBrine Greco B., Payne G.S., 
Tubb B. 

J. Biol. Chem. 266:11153-11157(1991). 
10 1 3] Nakai M.. Takada T, Endo T 

Biochim. Biophys. Acta 1174:262-284(1993). 
[ 4] Phan H.L. Finlay J.A.. Chu D.S., Tan P.K., Kirchhausen T, Payne G.S. 

EMBO J. 13:1706-1717(1994). 
[ 5] Kuge O., Hara-Kuge S., Orel L., Ravazzola M., Amherdt M., Tanlgawa G.. 
IS Wieland F.T, Rothman J.E. 

J. Cell Biol. 123:1727-1734(1993). 

[0348] 95. Clathrtn_lg_ch (Clathrin light chain.) 
Number of members: 8 

20 [0349] Clathrin [1.2] is the major coat-forming protein that encloses vesicles such as coated pits and forms cell 
surface patches involved in membrane traffic within eukaryotic cells. The clathrin coats (called triskelions) are com- 
posed of three heavy chains (180 Kd) and three light chains (23 Id 27 Kd). 

[0350] The clathrin light chains [3], which may help to property orient the assembly and disassembly of the clathrin 
coats, bind non-covalently to the heavy chain, they also bind calcium and interact with the hsc70 uncoating ATPase. 

25 

- In higher eukaryotes two genes code for distinct but related light chains: LC(a) and LC(b). Each of the two genes 
can yield, by tissue-specific alternative splicing, two separate forms which differ by the insertion of a sequence of 
respectively thirty or eighteen residues. There is. in the N4erminal part of the clathrin light chains a domain of 
twenty one amino acid residues which is perfectly conserved in LC(a) and LC(b). 

30 . tn yeast there is a single light chain (gene CLC1) whose sequence is only distantly related to that of higher eu- 
karyotes. 

[0351] Two signature pattems have been developed for clathrin light chains. The first pattem is a heptapeptide from 
the center of the consented N-temiinal regfon of eukaryotk; light chains; the second pattern is derived from a positively 
3S charged regk>n located in the C-terminal extremity of all known clathrin light chains. 

- Consensus pattem: F-L-A-Q-Q-E-S 

[1]Keen J.H. 
40 Annu. Rev. Biochem. 59:415-438(1990). 

[ 2] Brodsky FM. 

Science 242:1396-1402(1988). 
[ 3] Brodsky RM., Hill B.L, Acton S.L. Naethke I., Wbng D.H.. 
Ponnambalam S., Parham P. 
45 Trends Biochem. Sci. 16:208-21 3(1 991 ). 

[0352] 96. (Clathrin repeat) 7-fold repeat In Clathrin and VPS 

Each repeat is about 140 amino acids tong. The repeats occur in the arm region of the Clathrin heavy chain. 
Number of members: 79 
so [1] 

Medline: 92191269 

Folding and trimerization of clathrin subunits at the triskelk^n hub. 
Nathke IS. Heuser J. Lupas A. Stock J. Turck CW, Brodsky FM; 
CelM992;68:899-910. [2] 
ss Medline: 88097376 

Clathrin heavy chain: molecular ckming and complete primary structure. 
Kirchhausen T, Harrison SC, Chow EP. Mattaliano RJ, 
Ramachandran KU Smart J, Brosius J; 



66 



EP1 033 405 A2 



Proc Natl Acad Sci U S A 1987;84:8805-8809. 
[0353] 97. Collagen (Collagen triple helix repeat (20 copies)) 
[1] Medline: 94059583 
New mennbers of the collagen superfamily 
s ^teyne R. Brewton RG; 

CurrOpin Cell Biol 1993;5:88d-890. 
Scurvy is associated with collagens. 
Members of this family belong to the collagen superfamily [1 ]. 

Collagens are generally extracellular structural proteins involved in formation of connective tissue structure. 
10 The alignment contains 20 copies of the G-X-Y repeat that fomis a triple helix. The first position of the repeat is glycine, 
the second and third positions can be any residue but are frequently proline and hydroxyproline. Collagens are post 
translatbnally modified by proline hydoxylase to form the hydroxyproline residues. Defective hydroxylation is the cause 
of scun^. 

Some members of the collagen superfamily are not involved in connective tissue structure but share the same triple 
IS helical structure. 

Number of members: 2125 

[0354] 98. Coprogen_OKldas (Coproporphyrinogen III oxidase) 
Number of members: 12 

Coproporphyrinogen til oxidase (EC 1.3.3.3) (coproporphyrinogenase) [1.2] catalyzes the oxidative decarboxylation 
20 of coproporphyrinogen III into protoporphyrinogen IX, a common step in the pathway for the biosynthesis of porphyrins 
such as heme, chlorophyll or cobalamin. 

[0355] Coproporphyrinogen 1 1 1 oxidase is an enzyme that requires iron for its activity A cysteine seems to be important 
for the catalytic mechanism [3]. Sequences from a variety of eukaryotic and prokaryotic sources show that this enzyme 
has been evolutionarity consented. A highly consented region in the central part of the sequence has been selected 
2S as a signature pattem. This region contains the only consented cysteine and is rich in charged amino acids. 

- Consensus pattem: K-x-W-C-x(2)-[FYH](3)4LIVM]-x-H-R-x-E-x-R-G-[LIVM]-G-G-[LIVM]-F-F-D 

[1]Xu K.. EllinttT. 
30 J. Bacteriol. 175:4990-4999(1993). 

[ 2] Kohno H., Fuojkawa T, Yoshinaga T. Tokunaga R., Taketani S. 

J. Bk>l. Chem. 268:21359-21363(1993). 
[ 3] Camadro J.M., Chambon H., Jolles J., Labbe R 
Eur. J. Bkx:hem. 156:579-587(1986). 
3S [4]XuK., EllbttT 

J. Bacteriol. 176:3196-3203(1994). 

[0356] 99. Corona_nucleoca (Coronavirus nudeocapski protein) 
[1] 

40 Medline: 98087828 

Identificatkxi of a specifk: interactkm tietween the 
coronavims mouse hepatitis virus A59 nucleocapski protein 
and packaging signal. 
Molenkamp R, Spaan WJ; 
45 Virotogy 1 997;239:7M6. 

Number of members: 44 
[0357] 100. Cu-oxklase (Multkx)pper oxkiase) 
[1] 

Medline: 90126844 
so The blue oxklases, ascortxate oxkiase, laccase and ceruloplasmin. 
Modelling and structural relationships. 
Messerschmkit A, Huber R; 
Eur J Bk)chem 1990;187:341-352. 
Number of members: 150 

55 [0358] Mullkxtpper oxkJases [1 ,2] are enzymes that possess three spectroscopkalty different copper centers. These 
centers are called: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear). The enzymes that betong to 
this family are: 
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Laccase (EC 1 . 1 0.3.2) (urishiol oxidase), an enzyme found in fungi and plants, which oxidizes many different types 
of phenols and diamines. 

Ascort)ate oxidase (EC 1 .1 OS. 3), a higher plant enzyme. 

- Ceruloplasmin (EC 1.16.3.1) (ferroxidase), a protein found in the serum of mammals and birds, which oxidizes a 
s great variety of inorganic and organic substances. Structurally ceruloplasmin exhibits internal sequence homology, 

and seem to have evolved from the triplication of a copper-binding domain similar to that found in laccase and 
asoorbate oxidase. 

[0359] In addition to the above enzymes there are a number of proteins which, on the basis of sequence similarities, 
10 can be said to belong to this family. These proteins are: 

- Copper resistance protein A (copA) from a plasmid in Pseudonnonas syringae. This protein seems to be involved 
in the resistance of the microbal host to copper. 

Blood coagulation factor V (Fa V). 
IS . Blood coagulation factor VIII (Fa VIII) [El]. 

Yeast FET3 [3], which is required for ferrous iron uptake. 

- Yeast hypothetical protein YFL041 w and SpACI F7.08. the fission yeast homolog. 

[0360] Factors V and VI 1 1 act as cof actors in blood coagulation and are structurally similar [4]. Their sequence consists 
20 of a triplicated A domain, a B domain and a duplicated C domain; in the following order: A-A-B-A-C-C. The A-type 
domain is related to the multicopper oxidases. 

[0361] Two signature patterns have been developed for these proteins. Both patterns are derived from the same 
region, which in ascortsate oxidase, laccase, in the third domain of ceruloplasmin, and in copA, contains five residues 
that are known to be involved in the binding of copper centers. The first pattem does not make any assumptkxi on the 
25 presence of copper-binding residues and thus can detect domains that have lost the ability to bind copper (such as 
those in Fa V and Fa VIII), while the second pattern is specific to copper-binding domains. 

- Consensus pattern: G-x-[FYWl-x-[UVMFYW]-x-[CST]-x(8)-G-[LM]-x(3)-[LI VMFYW] 

- Consensus pattem: H-C-H-x(3)-H-x(3)-[AG]-[LMl [The first two H's are copper type 3 binding residues] [The C, 
30 the 3rd H, and L or M are copper type 1 ligands] 

[0362] 101 . Cullin (Cullin family) 
Number of members: 24 

[0363] The following proteins are collectively termed cullins [1 ]: 

35 

• Caenorhabditis elegans cul-1 (or lin-1 9), a protein required for devebpmentalty programmed transitbns from the 
61 phase of the cell cycle to the GO phase or the apoptotk: pathway. 

- Caenorhabdilis elegans cul-2, cul-3, cuM {F45E12.3), cul-5 (ZK856.1) and cul-6 (K08E7.7). 

- Mammalian CUL1 . CUL2, CUL3, CUL4A and CUL4B. 

40 - Mammalian vasopressin-activated cateiunrwnobilizing receptor (VACM-1 ), a kidney-specific protein thought to form 
a cell surface receptor [2] but which does not have any stmctural hallmarks of a receptor. 
Drosophila Itn19. 

- Yeast CDC53 [3], which acts in concert with CDC4 and UBC3 (CDC34) to control the G1 -to-S phase transitbn. 

- Yeast hypothetk:al protein YGFK)03w. 

^ - Fission yeast hypothetk:al protein SpAC24H6.03. 

[0364] The cullins are hydrophilic proteins of 740 to 81 5 amino ackis. The C-terminal extremity is the most consented 
part of these proteins. A signature pattem has been devebped from that regkxi. 

so - Consensus pattem: (Uyi-K-x(2)-[UV]-x(2)-L-HDEQ]-(KRHNQ]-x-Y-[LIVM]-x-R-x(6.7)-[FY]-x-Y-x-[SA]> 

[ 1] Kipreos ET, Lander LE., Wing J.R. He WW.. Hedgecock E.M. 

Cell 85:82&«39(1996). 
[ 2] Bumatowska-Hledin M.A, Spielman W.S., Smith WL. Shi P.. Meyer J.M., 
SS Dewitt D.L. 

Am. J. Physiol. 26811 198-F1 21 0(1 995). 
[ 3] Mathias N.. Johnson S.L, Wmey M., Adams A.E.. Goetsch L, Pringle J.R., 

Byers B., Goebl M.G. 
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Mol. Cell. Biol. 16:6634-6643(1996). 

[0365] 102. (Cu_amlne_axid) 
Copper amine oxidase signatures 
s Amine oxidases (AO) [1] are enzymes that catalyze the oxidation of a wide range of biogenic amines including many 
neurotransmitters, histamine and xenobiotic amines. There are two classes of amine oxidases: flavin-containing (EC 
1.4.3.4) and copper-containing (EC 1.4.3.6). 

[0366] Copper-containing AO is found in bacteria, fungi, plants and animals, it is an honruxJimeric enzyme that binds 
one copper ion per subunit as well as a 2,4,5- trihydroxyphenylabnine quinone (or topaquinone) (TPQ) cofactor. This 

10 cofactor is derived from a tyrosine residue. 

[0367] Two signature patterns were derived for copper AO, the first one contains the tyrosine which give rises to the 
TPQ cofactor while the second one contains one of the three histidines that bind the copper atom [2]. 
[0368] Consensus patlem[LI VMHLIVMAl-[UVMF]-x(4)-[ST|-x(2)-N-Y-[DEHYN] [The first Y gives rises to TPQ] Se- 
quences known to belong to this class detected by the pattemALL 

IS [0369] Consensus pattern T-x-{GS]-x(2)-H-[LI VMFJ-x(3)-E-[DE]-x-P [H is a copper llgand] Sequences known to be- 
tong to this class detected by the pattern ALL, except for lentil AO. 

[ 1] Knowles RR, Dooley D.M. (In) Metal tons in bbtogbal systems; Sigel H., Sigel A., Eds;, 30:361-403, Marcel 
Dekker. New-York, (1993). 

20 1 2] Parsons M.R., Convery M.A., Wilmot CM., Yadav K.D.S., Blakeley V., Comer A.S., Phillips S.E. V., McPherson 
M.J., Knowles RR Structure 3:1171-1184(1995). 

[0370] 1 03. Cys-protease (Cysteine protease) 
Number of members: 358 

2S [0371] Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which contain an active site 
cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histWine skie chain; an 
asparagine completes the essential catalytic triad. The proteases which are currently known to betong to this family 
are listed below (references are only provided for recently determined sequences). 

30 - Vertebrate lysosomal cathepsins B (EC 3.4.22. 1 ), H (EC 3.4.22. 1 6), L (EC 3.4.22. 1 5). and S (EC 3.4.22.27) [2]. 

- Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsin C) [2], 

- Vertebrate calpains (EC 3.4.22. 1 7). Calpains are intracellular cateiumnactivated thtol protease that contain both a 
N-terminal catalytb domain and a C-tenminal calcium-binding domain. 

- Mammalian cathepsin K. which seems involved in osteoclasts bone resorptbn [3). 
35 - Hunnan cathepsin O [4]. 

- Bleonrrycin hydrolase. An enzyme that catalyzes the inactivatiwi of the antitumor drug BLM (a glycopeptide). 

- Plant enzymes: barley aleu rain (EC 3.4.22.16), EP-B1/B4; kidney bean EP-C1. rice bean SH-EP; kiwi fruit actinidin 
(EC 3.4.22.14); papaya latex papain (EC 3.4,22.2), chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30). and pro- 
teose IV (EC 3.4.22.25); pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22.32); rape 

40 COT44; rice oryzain alpha, beta, and gamma; tomato jow-temperature induced. Arabklopsis thaliana A494, RD1 9A 

and RD21A. 

House-dust mites allergens DerPI and EurMI . 

- Cathepsin B-like proteinases from the worms Caenorhabditis elegans (genes gcp-1 , cpr-3. cpr-4. cpr-5 and cpr- 
6). Schistosoma mansonr (antigen SM31) and Japonica (antigen SJ31 ), Haemonchus contortus (genes AC-1 and 

45 AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 

Slime mold cysteine proteinases CP1 and CP2. 
Cruzipain from Trypanosonna cruzi and brucei. 

- Throphozoite cysteine proteinase (TCP) from various Plasmodium species. 

- Proteases from Leishmania mexk»na. Theileria annulata and Theileria parva. 
so - Baculoviruses cathepsin-like enzyme (v-cath). 

- Drosophib small optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain. 

- Yeast thbl protease BLHWCP1/LAPa 

CaerK>rhabditis elegans hypothetbal protein C06G4.2. a calpain-like protein. 

ss [0372] Two bacteriaJ peptidases are also part of this family: 

■ Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 

- Thfol protease tpr from Porphyromonas gingivalis. 
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[0373] Three other proteins are structurally related to this family, but may have lost their proteolytic activity. 

- Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. 

- Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced 
s by a serine. Rat testin should rrat be confused with mouse testin which is a UM-domain protein (see 

<PDOC00382>), 

- Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. This protein of 111 Kd pos- 
sesses a C-terminal thiol-protease-like domain [6], but the active site cysteine is replaced by a serine. 

10 [0374] The sequences around the three active site residues are well conserved and can be used as signature pat- 
terns. 

- Consensus pattern: Q-x(3)-[GE]-x-C-[YW|-x(2HSTAGC]-[STAGCV] [C is the active site residue] 

- Consensus pattern: [LIVMGSTANJ-x-H-lGSACE]-[UVMl-x4LIVMAT](2)-G-x-(GSADNH] [H is the active site resi- 
ts due] 

. Consensus pattem: [FYCH]-IWI]4LIVTl-x4KRQAG]-N-(ST]-W-x(3)-[FYVV]-G-x(2)-G-ILFYV^ 
[LI VMF] [N is the active site residue] 

20 1 1] Dufour E. 

Biochimie 70:1335-1342(1988). 
[ 2] Kirschke H., Barrett A.J., Rawlings N.D. 

Protein Prof. 2:1587-1643(1995). 
( 3] Shi G.-P, Chapman H.A.. Bhairi S.M., Deleeuw C. Reddy V.Y., Weiss S.J. 
2S FEBS Lett. 357: 1 29-1 34(1 995). 

[ 4] Velasco G., Ferrando A.A. Puente X.S.. Sanchez L.M.. Lopez-Otin C. 

J. Biol. Chem. 269:27136-27142(1994). 
[ 5] Chapot-Chartier M.P., Nardi M., Chopin M.C., Chopin A., Gripon J.C. 
Appt. Environ. Mk;robiol. 59:330-333(1993). 
30 [ 6] Higgins D.G.. McConnell D.J.. Sharp RM. 

Nature 340:604-604(1989). 
[ 7] Rawlings N.D.. Barrett A.J. 
Meth. Enzymol. 244:461-486(1994). 

35 [0375] 1 04. Cys_Met_Meta_PP (Cys/Met metabolism PLP-dependent enzyme) 
[1] Medline: 96428687 

Crystal structure of the pyridoxal-5'-pho8phate dependent cystathionine beta-lyase from Escherichia coli at 1.83 A. 
Clausen T, Huber R, Laber B. Pohlenz HD, Messerschmidt A; 
J Mol Biol 1996;262:202-224. 
40 [1] Medline: 99059720 

Crystal structure of Escherichia coli cystathionine gamma-synthase at 1 .5 A resolution. 
Clausen X Huber R, Prade U Wahl MC, Messerschmidt A; 
EMBO J 1998:17:6827-6838. 
Database Reference: SCOP; Icsl; fa; [SCOP-USA][CATH-PDBSUM] 
^ This family includes enzymes involved in cysteine and methionine metabolism. The following are members: 

Cystathionine gamma-fyase. 
Cystathionine gamma-synthase. 
Cystathionine beta4yase, 
so Methionine gamma-lyase, 

OAH/OAS sulfhydrylase. 
O-succinylhomoserine sulphhydrylase 

All of these members participate is slightly different reactions. 
ss Ail these enzymes use PLP (pyridoxal-5'-phosphate) as a cofactor. 
Number of members: 52 

[0376] A number of pyridoxaWependent enzymes involved in the metabolism of cysteine, homocysteine and me- 
thionine have been shown [1,2] to be evolutionary related. These are: 
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- Cystathionine gamma-lyase (EC 4.4. 1 .1 ) (gamma-cystathionase), which catalyzes the transfomiation of cystath- 
ionine into cysteine, oxobutanoate and ammonia This is the final reaction in the transulfuration pathway that leads 
from methionine to cysteine in eukaryotes. 

- Cystathionine gamma-synthase (EC 4.2.99.9), which catalyzes the conversion of cysteine and succinyl-homoser- 
5 ine into cystathionine and succinate: the first step in the biosynthesis of methionine from cysteine in bacteria (gene 

metB). 

- Cystathionine beta-lyase (EC 4.4.1.8) (beta-cyslathionase), which catalyzes the conversion of cystathionine into 
homocysteine, pyruvate and ammonia: the second step in the biosynthesis of methionine from cysteine in bacteria 
(gene metC). 

10 - Methionine gamma-lyase (EC 4.4.1.11) (L-methioninase) which catalyzes the transformation of methionine into 
methanethiol, oxobutanoate and ammonia. 

- OAH/OAS sulfhydrytase. which catalyzes the conversion of acetylhomoserine into homocysteine and that of ace- 
tylserine into cysteine (gene MET17 or MET25 in yeast). 

- O-succinylhomoserine sulfhydrylase (EC 4.2.99.-). 
IS - Yeast hypothetical protein YGL184c. 

- Yeast hypothetical protein YHR1 1 2c. 

[0377] These enzymes are proteins of about 400 amino-acid residues. The pyridoxal-P group is attached to a lysine 
residue located in the central section of these enzymes; the sequence around this residue is highly conserved and can 
20 be used as a signature pattem to detect this class of enzymes. 

- Consensus pattem: [DQHLIVMFJ-x(3)-[STAGC]-{STAGCIhT-K-[FYWQ]-[LI VMFJ-x-G-[HQHSGNH] [K is the pyri- 
doxal-P attachment site] 

2S [ 1] Ono B.I., Tanaka K., Naito K.. Heike C. Shinoda S., Yamamoto S.. Ohmori S.. OshimaT. Toh-E A. J. Bacterk>l. 

174:3339-3347(1992). 

[ 2] Barton AB.. Kaback D.B., Clartc M.W., Keng T, Ouellette B.F.F.. Storms R.K.. Zeng B.. Zhong W.W., Fortin 
N., Delaney S., Bussey Yeast 9:363-369(1993). 

30 [0378] 105. Cyt_reductase 

FAD/NAD-binding Cytochrome reductase 
Number of members: 60 
[1] Medline: 95111952 

Crystal stmcture of the FAD-containing fragment of com nitrate reductase at 2.5 A resolutkxi: relationship to other 
3S flavoprotetn reductases. 

Lu G, Campbell WH, Schnekter G, Lindqvist Y; 
Structure 1994;2:B09^21. 
[2] Medline: 92084635 

The sequence of squash NADH:nitrate reductase and its relatk)nship to the sequences of other flavoprolein oxidore- 

40 ductases. A family of flavoprotein pyrkJine nucleotide cytochrome reductases. 
Hyde GE, Crawford NM. Campbell WH; 

J Bk>i Chem 1991;266:23542-23547. 
[0379] 106. Cytkiylyltrans 
Phosphatkiate cytkJyIyttransf erase 

45 Number of members: 21 

[0380] Phosphatkiate cytidylyttransf erase (EC 2.7.7.41 ) [1 ,2,3] (also known as CDPHdiacylglycerol synthase) (CDS) 
is the enzyme that catalyzes the synthesis of CDP-diacylglycerol from CTP and phosphatidate (PA). CDP-diacylglycerol 
is an important branch point intermediate in both prokaryotk: and eukaryotk: organisms. CDS is a membrane-bound 
enzyme. A conserved regksn kx:ated in the C^erminal part has been selected as a signature pattem. 

so 

■ Consensus pattem: S-x-{LIVMF]-K-R-x(4)-K-D-x-IGSA]-x(2)-[LI]-[PG]-x-H-G-G-ILIVM]-x-D-R-[UVMF]-D 

[ 1] Sparrow CP, Raetz C.RH. 

J. Bk>l. Chem. 260:12084-12091(1985). 
55 [ 2] Shen H., Heacock PN., Clancey C.J.. Dowhan W. 

J. Bbl. Chem. 271:789-795(1996). 
( 3] Saito S., Goto K., Tonosaki A, Kondo H. 

J. BkA. Chem. 272:9503-9509(1997). 
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[0381] 107. (CytkJyIyltransf) Cytidylyltransferase. This family includes: Cholinephosphate cytidylyltransferase. Glyc- 
erol-3-pho8phate cytidylyltransferase. 
Nunrit>er of memt>ers: 64 

[0382] [1] Medline: 10208837 CTRPhosphocholine Cytidylyttransf erase: Insights into Regulatory Mechanisms and 

s Novel Functions. Clement JM, Kent C; Biochem Biophys Res Commun 1 999;257:643-650. 

[0383] 108. (cNMP binding) Cyclic nucleotide-binding domain signatures and prcrfile Proteins that bind cyclic nucle- 
otides (cAMP or cGMP) share a structural domainof about 120 residues (1-3]. The best studied of these proteins is 
theprokaryotic catabolrte gene activator (also known as the cAMP receptorprotein) (gene crp) where such a domain 
is known to be composed of threealpha-helices and a distinctive eight-stranded, antiparallel beta-barrelstructure. Such 

10 a domain is known to exist in the following proteins: - 

Prokaryotk: catabolrte gene activator protein (CAP). - cAMP- and cGMP-dependent protein kinases (cAPK and cGPK). 
Both types of kinases contains two tandem copies of the cyclic nucleotkJe-binding domain. The cAPK's are composed 
of two different subunits: a catalytic chain and a regulatory chain which contains both copies of the domain. The cGPK's 
are single chain enzymes that include the two copies of the domain in their N- terminal section. The nucleotfcfe specificity 

IS of cAPK and cGPK is due to an amino acid in the conserved region of beta-barrel 7: a threonine that is invariant in 
cGPK is an alanine in most cAPK. - Vertebrate cyclic nucleotide-gated ion-channels. Two such cations channels have 
been fully characterized. One is found in rod cells where it plays a role In visual signal transduction. It specifically binds 
to cGMP leading to an opening of the channel and thereby causing a depolarization of rod photoreceptors. In olfactory 
epithelium a similar. cAMP-brndIng, channel plays a role in odorant signal transduction. There are six invariant amino 

20 acids in this domain, three of whKh are glycine residues that are thought to be essential tor maintenance of the of the 
beta-barrel. Two signature patterns for this domain have been devebped. The first pattern is located within beta-barrels 
2 and 3 and contains the first two conserved Gly. The second pattern is k>cated within beta-barrels 6 and 7 and contains 
the third conserved Gly as well as the three other invariant residues. - 
First consensus pattern: ILIVM]-[VIC]-x(2)-G-[DENQTA]-x-[GAC]-x(2)-[LIVMFY](4)-x(2)-G 

5S Second consensus pattem: [LIVMF]-G-E-x-[GAS]-[LI VMJ-x(5.11 )-R-[STAQ]-A-x-[LIVMA]-x- [STACV]- 

[ 1] Weber I.T, Shabb J.B.. Corbin J.D. Biochemistry 28:6122-6127(1989). 

[ 2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 

[ 3] Shabb J.B., Corbin J.D. J. Bbl. Chem. 267:5723-5726(1992). 

30 

[0384] 109. (cadherin) 

Cadherins extracellular repeated domain signature 

Cadherins [1,2] are a family of animal glycoproteins responsible for cateium-dependent cell-cell adhesion. Cadherins 
preferentially interact with themselves in a honrK>philk: manner In connecting cells; thus acting as both receptor and 
3S ligand A wide number of tissue-specifk: forms of cadherins are known: 

- Epithelial (E-cadherin) (also known as uvomorulin or L-CAM) (CDH1). 

- Neural (N-cadherin) (CDH2). 

- Placental (P-cadherin) (CDH3). 

- Retinal (R-cadherin) (CDH4). 

- Vascular endothelial (VE-cadherin) (CDH5). 

- Kkiney (K-cadherin) (CDH6). 

- Cadherin-8 (CDH8). 

- Osteoblast (OB-cadherin) (CDH11 ). 
^ - Brain (BR-cadherin) (CDH12). 

- T-<iadherin (tmncated cadherin) (CDH1 3). 

- Muscle (M-cadherin) (CDH14). 
Liver-intestine (Ll-cadherin). 
EP-cadherin. 

so 

[0385] Structurally, cadherins are built of the following domains: a signal sequence, followed by a propeptide of about 
130 residues, then an extracellular domain of around 600 reskJues. then a transmembrane regkxi, and finally a C- 
terminal cytoplasrrttc domain of about 150 residues. The extracellular domain can be sub- divided into five parts: there 
are four repeats of about 1 1 0 reskiues f6lk)wed by a region that contains four consented cysteines. It is suggested that 
ss the cateium-binding regkxi of cadherins is located in the extracellular repeats. 

[0386] Cadherins are evolutionary related to the desmogleins whteh are component of intercellular desmosome junc- 
tkxis involved in the interaction of plaque proteins: 
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Desmoglein 1 (desmosomal glycoprotein I). 
Desmoglein 2. 
- Desmoglein 3 (Pemphigus vulgaris antigen). 

s [0387] The Drosophila fat protein [3] is a huge protein of over 5000 amino acids that contains 34 cadherin-like repeats 
in its extracellular domain. 

[0388] The signature pattern that was developed lor the repeated domain is located in it the C-terminal extremity 
which is its best conserved region. The pattern includes two consen/ed aspartic acid residues as well as two aspar- 
agines; these residues could be Implicated in the binding of calcium. 
10 [0389] Consensus pattem[Ll V]-x-[U V]-x-D-x-ISI-D-(NH]-x-P Sequences known to befong to this class detected by the 
pattern ALL. Note this pattern is found in the first, second, and fourth copies of the repeated domain. In the third copy 
there is a deletion of one residue after the second conserved Asp. 

[ 1]Takek:hi M. Annu. Rev Bkjchem. 59:237-252(1990). 
IS 1 2] Takeichi M. Trends Genet. 3:21 3-21 7(1 987). 

[ 3] Mahoney P.A., Weber U., Onofrechuk P.. Biessmann H., Bryant PJ.. Goodman C.S. Cell 67:853^868(1991). 

[0390] 110. Calreticuiin family signatures 

Calretkjulin [1 ] (also known as calregulin, CRP55 or HACBP) is a high<:apacitycalcium-bindlng protein which is present 

20 in most tissues and k«ated at the periphery of the endoplasmic (ER) and the sarcoplamk: reticulum (SR)membranes. 
It probably plays a role in the storage of calcium in the lumen ofthe ER and SR and it may well have other important 
f unctbns. Stmcturally. calreticuiin is a protein of about 400 amino acid residues consisting of three domains: a) An N- 
terminal, probably globular, domain of about 180 amino acid residues (N-domain); b) A central domain of about 70 
residues (P-domain) which contains three repeats of an acidic 17 amino acid motif. This region binds cateium with a 

2S tow-capacity, but a high-affinity; c) A C-tenminal domain rich in acidic residues and in lysine (C-domain). This region 
binds cateium with a high-capacity but a tow-affinity. Calreticuiin is evolutionary related to the following proteins: - 
Onchocerca volvulus antigen RAL-1 . RAL-1 is highly similar to calrettoulin, but possesses a C-terminal domain rtoh in 
lysine and arginine and lacks acidic reskiues and is therefore not expected to bind calcium in that region. - Calnexin 
[2]. A calcium-binding protein that interacts with newly synthesized glycoproteins in the endoplasmic reticulum. It seems 

30 to play a major role in the quality control apparatus of the ER by the retention of incorrectly folded proteins. - Calmegin 
[3] (or calnexin-T), a testis-specific catoiunrhbinding protein highly similar to calnexin. Three signature patterns have 
been devetoped for this family of proteins. The first two patterns are based on consented regions in the N-domain; the 
third pattern corresponds to positions 4 to 16 of the repeated motif in the P-domain. 
Consensus pattern: IKRHN]-x-[DEQN]-[DEQNK]-x(3)-C-G-G-[AG]-[FY]-[LIVM]-[KN]-ILIVMFY](2)- 

3S Consensus pattern: [LIVM](2)-F-G-P-D-x-C-(AG]- 

Consensus pattem: [IV]-x-D-x-[DENST]-x(2)-K-P-[DEH]-D-W-[DEN]- 

[ 1] Michalak M., Milner RE.. Bums K., Opas M. Blochem. J. 285:681-692(1992). 
[ 2] Bergeron J.J.M., Brenner M.B., Thomas D.Y., Williams D.B. Trends Btochem. Sci. 19:124-128(1994). 
40 [ 3] Watanabe D., Yamada K.. Nishlna Y, Tajima Y, Koshimizu U.. Nagata A.. NIshlmune Y J. Btol. Chem. 269: 

7744-7749(1994). 

[0391] 111. Eukaryotic-type carbonic anhydrases signature (carbjanhydrase) 

Carbonk: anhydrases (EC 4.2.1.1 ) (CA) [1,2,3.4] are zinc metalloenzymes which catalyze the reversible hydration of 
45 carbon dtoxkie. Eight enzymatic and evoluttonary related forms of carbonc anhydrase are currently known to exist In 
vertebrates: three cytosolto isozymes (CA-I, CA-II and CA-III); two membrane-bound forms (CA-IV and CA-VII); a 
mitochondrial form (CA-V); a secreted salivary form (CA-VI); and a yet uncharacterized isozyme {5].ln the alga 
ChlamydonKXias reinhardtii. two CA isozymes have been sequenced(6]. They are periplasmic glycoproteins evolu- 
ttonary related to vertebrate CAs. Some bacteria, such as Neisseria gonorrhoeae [7] also have a eukaryotic-type CA. 
so CAs contain a single zinc atom bound to three conserved hIstkJIne reskJues. As a signature for CAs, a pattem has 
been developed which Includes one of these zinc-binding histkJines. Protein D8 from ^^ccinia and other poxviruses is 
related to CAs but has tost two of the zinc-binding histidines as well as many otherwise consented residues. This is 
also true of the N-terminal extracellular domain of some receptor4ype tyrosine-protein phosphatases (see 
<PDOC00323 >). 

ss Consensus pattem: S-E-(HN]-x-[LIVM]-x(4)-[FYH]-x(2)-E-ILIVMGA]-H-[LIVMFA](2) [The second H is a zinc llgand]- 
Note: most prokaryotic CA's as well as plant chloroplast CA's betong to another, evoluttonary distinct family of proteins 
(see <PDOC005e6 
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[ 1] Deutsch H.F. Int. J. Biochem. 19:101-113(1987). 
[ 2] Fernley R.T Trends Biochem. Sci. 13:356-359(1988). 
[ 3] Tashian R.E. BioEssays 10:186-192(1989). 
[ 4] Edwards Y. Biochem. Soc. Trans. 18:171-175(1990). 
5 [ 5] Skaggs LA., Bergenhem N.C.H.. Venta P.J.. Tashian R.E. Gene 126:291-292(1993). 

1 6] Fujiwara S.. Fukuzawa H., Tachiki A.. Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 87:9779-9783(1990). 

[ 7] Huang S., Xue Y., Sauer-Eriksson E., Chirfca L, Undskog S.,Jonsson B.H. 2.3.C0.2-'J. Mol. Biol. 283 301-310 

(1998). 

10 [0392] 1 1 2. Caseins alpha/beta signature 

Caseins [1] are the major protein constituent of milk. Caseins can be classified into two families; the first consists of 
the kappa-caseins, and the second groups the alpha-si, alpha-s2, and beta-caserns. The alpha/beta caseins are a 
rapidly diverging family of proteins. However two regions are consented: a cluster of phosphorylated serine residues 
and the signal sequence. The signature pattern has been devetoped for this family of proteins based upon the last 

IS eight residues of the signal sequence. 

Consensus pattern: C-L-[LVI-A-x-A-[LVFJ-A- 

[1] Holt C, Sawyer L Protein Eng. 2:251-259(1988). 

[0393] 113. Catalase signatures 

Catalase (EC 1.11.1.6) [1.2,3] is an enzyme, present in all aerobic cells,that decomposes hydrogen peroxide to mo- 
^ lecular oxygen and water Its main function Is to protect cells from the toxte effects of hydrogen peroxkJe. In eukaryotic 
organisms and in some prokaryotes catalase is a molecule composed of four kientk:ai subunits. Each of the subunits 
binds one protoheme IX group. A conserved tyrosine serves as the heme proximal skte ligand. The region around this 
residue has been used as a first signature pattern; it also includes a consented arglnrne that partcipates in heme- 
binding. A consen/ed histidine has been shown to be important for the catalytic mechanism of the enzyme. The region 
25 around this residue has been selected as a secorKJ signature pattern. - 

Consensus pattern: R-[LIVMFSTAN]-F-[GASTNP]-Y-x-D-[ASTHQEH] [Y is the proximal heme-binding ligand] 
Consensus pattern: [IF]-x-[RH].x(4)-(EQ]-R-x(2)-H-x(2)4GASHGASTFJ-[GAST| [H is an active site residue] 
Note: some prokarydb catalases belong to the peroxidase family (see <PDOC00394>). 

30 [ 1] Murthy M.RN., Reid T.J. III. Sicignano A., Tanaka N., Rossmann M.G. J. Mol. Biol. 152:465-499(1981). 

[ 2] Melik-Adamyan W.R.. Barynin V.V.. Vagm A.A., Borisov V.V.. Vainshtein B.K., Fita I.. Murthy M.R.N.. Rossmann 
M.G. J. Mol. Bbl. 188:63-72(1986). 

[ 3] von Ossowki I.. Hausner G.. Loewen RC. J. Mol. Evol. 37:71-76(1993). 

35 [0394] 114. (chitin binding) Chitin recognitbn or binding domain signature 

A consented domain of 43 amino ackis is found in several plant and fungal proteins that have a common binding 
specificity for oligosaccharides of Nnacetylglucosamine [1]. This domain nnay be involved in the recognition or binding 
of chitin subunits. It has been found in the proteins listed bebw. - A number of non-leguminous plant lectins. The best 
characterized of these lectins are the three highly homotogous wheat germ agglutinins (WGA-1 . 2 and 3). WGA is an 

^ N-acetylglucosamine/N-acetylneuraminic acid binding lectin whk:h structurally consists of a fourfokJ repetitbn of the 
43 amino acid domain. The same type of structure is found in a barley root-specific lectin as well as a rice lectin. - 
Plants endochitinases (EC 3.2.1.14) from class lA (see <PDOC00620 >). Endochitinases are enzymes that catalyze 
the hydrolysis of the beta-1 ,4 linkages of N-acetyl glucosamine polymers off chitin. Plant chitinases function as a defense 
against chitin containing fungal pathogens. Class I A chitinases generally contain one copy of the chltin-binding domain 

^ at their N-terminal extremity. An exception is agglutinin/chitinase [2] from the stinging nettle Urtica dioica which contains 
two copies of the domain. - Hevein [5], a wound-induced protein found in the latex of rubber trees. - Wini and win2, 
two wound-induced proteins from potato. - Kluyveromyces lactis killer toxin alpha subunil [3]. The toxin encoded by 
the linear plasmki pGKLI is composed of three subunits: alpha, beta, and gamma. The gamma subunit harbors toxin 
activity and inhibits growth of sensitive yeast strains in the G1 phase of the cell cycle; the alpha subunit, whfch is 

so proteolytlcally processedf rom a larger precursor that also contains the beta subunit, is a chitinase (see < PDOC00839 >V 
In chitinases, as well as in the potato wound-induced proteins, the 43-residuedomain directly follows the signal se- 
quence and is therefore at the N-temninal of the mature protein; in the killer toxin alpha subunit it is kx:ated in the central 
sectbn of the protein. The domain contains eight conserved cysteine reskjues which have all been shown, in WGA. 
to be involved in disulfide bonds. The topologfcal anangement of the four disulfide bonds is shown in the foltowing 

ss figure: + m~I hlllll 

xxCgxxxxxxxCxxxxCCsxxgxCgxxxxxCxxxCxxxxC |***^|*********^**||||^ ^. ^ ^»q*- consented cysteine in- 
volved in a disulfide bond.**': positkm of the pattem. 
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- Consensus pattern: C-x(4,5)-C-C-&'X(2)-G-x-C-G-x(4)-(FYWJ-C [The five C's are involved in disulfide bonds] 

1 1] Wright H.T, Sandrasegaram G.. WIrlght C.S. J. MoL Evol. 33:283-294(1991). 
1 2] Lemer D.R.. Raikhel N.V. J. Biol. Chem. 267:11085-11091(1992). 
5 1 3] Butler A.R. ODonnel RW.. Martin V.J.. Gooday G.W.. Stark M.J.R Eur. J. Biochem. 199:483-488(1991). 

[0395] 1 1 5. (Chitinase 1 ) Chitinases family 1 9 signatures 

Chitinases (EC 3.2.1.14) f1] are enzymes that catalyze the hydrolysis of thebeta-1 ,4-N-acetyl-D-glucosamine linkages 
in chitin polymers. From the viewpoint of sequence similarity chitinases belong to either family 18 or 19 in the classi- 

10 ficatbn of glycosyl hydrolases [2,E1]. Chitinases of family 1 9(also known as classes lA or I and IB or II) are enzymes 
from plants that functbn in the defense against fungal and insect pathogens by destroying their chitinnxsntaining cell 
wall. Class lA/l and IB^I enzymes differ in the presence (lA/l) or absence (IB/II) of a N-terminal chrtin-binding domain 
(seethe relevant entry <PEXX:00025 >V The catalytic domain of these enzymes consist of about 220 to 230 amino ackl 
residues. Two highly conserved regions have been selected as signature patterns, the first one is located in the N- 

15 terminal sectkn and contains one of the six cysteines whbh are consented in most, if not all, of these chitinases and 
whk:h is probably Involved in a disulfide bond. 

Consensus pattern: C-x(4,5)-F-Y-[ST|-x(3)-{FY]-[UVMF]-x-A-x(3)-[YFJ-x(2)-F- [GSA] 
Consensus pattern: [UVMHGSAhF-x-ISTAG](2)-[LIVMFY]-W-[FY]-W-[UVIW] 

20 1 1] Flach J.. Pilct P.-E., Jolles P Expericntia 48:701-716(1992). 

[ 2] Henrissat B. Bkxhem. J. 280:309-316(1991). 

[0396] 1 1 6. chtoroa_b-blnd 

Chtorophyll A-B binding proteins. Number of members: 211 
2S [0397] 117.chronr» 

The 'chromo' (CHRromatIn Organizatbn Modifier) domain [1 to 4] is a conserved region of about 60 amino acids which 
was originally found in Drosophila modifiers of variegatkxi. whk:h are proteins that modify the structure of chromatin 
to the condensed morphology of heterochromatin. a cytologk^ally visible conditton where gene expression Is repressed. 
In protein Pdycomb, the chromo domain has been shown to be Important for chromatin targeting. Proteins that contains 
30 a chromo domain seem to fall into three classes: 

a) Proteins which have a N-terminal chromo domain foltowed by a regk>n which is related to but distinct from the 
chromo domain and which has been termed [3] the 'chromo shadow* domain. 

b) Proteins with a single chromo domain. 

35 c) Proteins with paired tandem chromo domains. 

[0398] Currently, this domain has been found in the foltowing proteins: 
[0399] Class A. 

40 - Drosophila heterochronrratin protein Su(var)205 (HP1 ). 
Human heterochromatin protein HP1 alpha. 
Mammalian nruxiifier 1 and modifier 2. 

- Fisskvi yeast swi6, a protein involved in the represskxi of the silent mating-type kx:i mat2 and mat3. 

4S [0400] Class B. 

Drosophila protein Polycomb (Pc). 

- Mammalian modifier 3, a homok>g of Pc. 

- Drosophila pr<Aein Su(var)3-9. a suppressor of positon-effect variegation. 
so - Human MI-2 autoantigen, characterisitic of dermatonnyosls. 

Fungal retrotranposon polyproteins: 'skippy' from Fusarium oxysporum. 'grasshopper* and 'MAGGY' from Mag- 
naporthe grisea and CfT-1 from Cladosporium f ulvum. 

- Fissk)n yeast hypothetksl protein SpAC18G6.02c. 

- Caenofhabditis elegans hypoth^k:al protein C29H1 2.5 
55 - Caenorhabditis elegans hypolhetkal protein ZK1 236.2. 

- Caenorhabditis elegans hypothetk:al protein T09A5.8. 

[0401] Class C. 
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Mammalian ONA-binding/heiicase proteins CHD-1 to CHD-4. 

- Yeast protein CHD1. 

[0402] The signature pattern for this domain corresponds to its best consen/ed section, which is located in its central 
s part. 

- Consensus pattern: [FYLhx^LIVMCHKR]-W-x^GDNRHFYWLMEhx(5.6HSTl^^ 

[ 1) Paro R Trends Genet. 6:416-421(1990). 
10 1 2] Singh PB., Miller J.R.. Pearce J., Kothary R, Burton R.D.. Paro R, James T.C., Gaunt S.J. Nucleic Acids Res. 

19:789-794(1991). 

[ 3] Aasland R. Stewart A.F. Nucleic Acids Res. 23:3168-3173(1995). 

[ 4] Koonin E.V, Zhou S., Lucchesis J.C. Nucleic Acids Res. 23:4229-4233(1995). 

IS [0403] 118. citrate_synt 

Citrate synthase (EC 4.1.3.7) (CS) is the tricarboxylic acid cycle enzyme that catalyzes the synthesis of citrate from 
oxaloacetate and acetyt-CoA in an aldol condensation. CS can directly form a carbon-carbon bond In the absence of 
metal ion cofactors. 

[0404] In prokaryotes, citrate synthase is composed of six identical subuntts. In eukaryotes, there are two isozymes 
20 of citrate synthase: one Is found in the mitochondrial matrix, the second is cytoplasmic. Both seem to be dimers of 
identical chains. 

[0405] There are a number of regions of sequence similarity between prokaryotic and eukaryotic citrate synthases. 
One of the best consen/ed contains a histidine which is one of three residues shown [1] to be involved in the catalytic 
mechanism of the vertebrate mitochondrial enzyme. This region has been used as a signature pattern. 

25 

- Consensus pattem: G-[FYAHGA]-H-x-[l V]-x(1 ,2)-[RKT]-x(2)-D-[PS]-R [H is an active site residue] 

[0406] [1] Karpusas M., Branchaud B., Remington S.J. Biochemistry 29:2213-2219(1990). 
[0407] 119.clpA_B 
30 Chapcronin cIpA/B 

CAUTION! This family is a subfamily of the AAA superfamily. The threshold has been set very high to stop overlaps 
with the AAA superfamily. This entry will be subsumed by AAA In the future. 
Number of members: 39 

[0408] A number of ATP-binding proteins that are are thought to protect cells from extreme stress by controlling the 
35 aggregation of denaturation of vital cellular structures have been shown [1 ,2] to be evolutionary related. These proteins 
are listed below. 

- Escherichia coli clpA, which acts as the regulatory subunit of the ATP-dependent protease dp. 

- Rhodopseudomonas blastica clpA honriolog. 

40 - Escherichia coli heat shock protein cIpB and homologs in olher bacteria. 
Bacillus subtilis protein mecB. 

- Yeast heat shock protein 1 04 (gene HSP104). whfch Is vital for tolerance to heat, ethanol and other stresses. 

- Neurospora heat shock protein hsp98. 

- Yeast mitochondrial heal shock protein 78 (gene HSP78) [3]. 

4S - CD4A and CD4b, two highly related tomato proteins that seem to be kx»ted in the chtoroplast. 

- Trypanosoma brucei protein cip. 

- Porphyra purpurea chloroplast encoded cIpC. 

[04091 The size off these proteins range from 84 Kd (clpA) to slightly more than 100 Kd (HSP104). They all share 
so two consenred regtons of about 200 amino acids that each contains an ATP-binding site. In addition to the ATP-binding 
A and B motifs there are many parts in these two domains that are also consented. Two of these regions have been 
selected as signature patterns. The first signature is kx»ted in the first domain, some ten residues to the C-terminal 
of the ATP-binding B motif. The second pattem is located in the second domain in-between the ATP-binding A and B 
motifs. 

55 

- Consensus pattem: D-[AI]-(SGA]-N-IUVMF](2)-K-[PTl-x-L-x(2)-G 

- Consensus pattem: R^UVMFYJ-[>x-S-E^LI^flWFY]-x.E-(KRQ]-x-(STA]-x-[STAHKR]-OJ VM]-^^^ 
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1 1] Gottesman S., Squires C. Pichersky E., Carrington M., Hobbs M.. Mattick J.S., Dalrymple B.. Kuramitsu H.. 
Shiroza T, Foster T., Clark W.R, Ross B., Squires C.L. Maurizi IS^R. Proc. Natl. Acad. Scl. U.S.A. 87:3513-3517 
(1990). 

[ 2] Parsell D.A.. Sanchez Y. Stitzel J.D.. Lindquist S. Nature 353:270-273(1991). 
5 1 3] Leonhardt S.A.. Fearon K., Danese P.N.. Mason TL Mol. Cell. Biol. 13:6304-6313(1993). 

[04101 120. cofilin.ADF 
Cofllln/lropornyosin-type actin-binding proteins 
[11 

10 Medline: 97290449 

Structure determinatkm of yeast cofitln. 

Fedorov AA, Lappalalnen P. Fedorov EV, Dnjbin DG. Almo SC; 

Nat Struct Bk)l 1997;4:366-369. 

[2] 

IS Medline: 97290450 

Crystal structure of the actin-binding protein actophorin from Acanthamoeba 
Leonard SA, Gittis AG. Petrella EC. Pollard TD. Lattman EE; 
Nat Struct Bbl 1997;4:369-373. 
[3] 

20 Medline: 97420794 

F-actin and G-actin btrKJing are uncoupled by mutatk)n of consen/ed tyrosine residues In maize actin depolymerizing 
factor. 

Jiang CJ. Weeds AG, Khan S. Hussey PJ; 
Proc Natl Acad Scl U S A 1997;94:9973-9978, 
2S [4] 

Medline: 97357155 

Cofilin pronDOtes rapki actin filament turnover in vivo. 
Lappalalnen R Drubin DG; 
Nature 1997;388:78-82. 
30 Severs actin filaments and binds to actin monomers. 
Number of members: 44 

[0411] Actln-depolymerizing proteins sever actin filaments (F-actIn) and/or bind to actin monomers, or G-actin, thus 
preventing actin-polymerizatk)n by sequestering the rrtonomers. The following proteins are evolutionary related and 
betong to a family of kjw molecular weight (1 37 to 166 residues) aclin-depolymerizing proteins [1 .2, 3,4]: 

35 

- Cofilin from vertebrates, slime mold and yeast. Cofilin binds to F-actin and acts as a pH-dependent actin-depo- 
lymerizing protein. 

- Destrin from vertebrates. Destrin birKis to G-actin in a pH-independent manner and prevents polymerizatkxi. 
Caenorhabditis elegans unc-60. 

40 - Acanthamoeba caslellanii actophorin. 

Plants actin depolymerizing factor (ADF). 

[0412] The most consented regkxi of these proteins is a twenty amino-acid segment that ends some 30 residues 
from their C-terminal extremity. This segment has been shown [5] to be important for actin-binding. 

45 

- Consensus pattern: P-[DEhx-(SA]-x-{LIVMT]-[KR]-x-[KR]-M-[LIVMHYAHSTA](3)-x(3)-[LI VMF]-[KR] 

[ 1] Hawkins M., Pope B., Maclver S.K.. Weeds AG. Bk>chemistry 32:9985-9993(1993). 
[ 2] lida K., Moriyama K., Matsumoto S., Kawasaki H.» Nishkia E.. Yahara I. Gene 124:115-120(1993). 
so 1 3] Quirk S., Maclver S.K., Ampe C. Doberstein S.K., Kaiser D.A.. van Damme J., \fendekerckhove J., Pollard T 

D. Biochemistry 32:8525-8533(1993). 

[ 4] McKim K.S., Matheson C, Marra MA, Wakarchuk M.F, Baillie D.L Mol. Gen. Genet. 242:346-357(1994). 
[ 5] Moriyama K.. Yonezawa N., Sakai H.. Yahara I., Nishida E. J. B»l. Chem, 267:7240-7244(1992). 

ss [0413] 121. (Complex 24kd) Respiratory-chain NADH dehydrogenase 24 Kd subunit signature Respiratory-chain 
NADH dehydrogenase (EC 1.6.5.3) [1.21 (also known as complexl or NADH-ubkjuinone oxidoreductase) is an oligo- 
meric enzymatc complex kxated in the inner mitochondrial membrane which also seems to exist inthe chloroplast and 
in cyanobacteria (as a NADH-plastoquinone c»cktoreductase). Among the 25 to 30 polypeptkle subunits of this bk>en- 
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ergetic enzyme complex there is one with a molecular weight of 24 Kd (in mammals), which is a component of the iron- 
sulfur (IP) fragment of the enzyme, ft seems to bind a2Fe-2S iron-sulfur cluster. The 24 Kd subunit is nuclear encoded, 

as aprecursor form with a transit peptide in mammals, and in Neurospora crassa.The 24 Kd subunit is highly similar 
to [3.4]: - Subunit E of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoE). - Subunit NCX>2 of Paracoccus 
denilrificans NADH-ubiquinone oxidoreductase. A highly consented region, located In the central section of this subunit 
containing two consen/ed cysteines that are probably involved in the binding of the 2Fe-2S center has been selected 
as a signature pattem. 



- Consensus pattem: D-x(2)-F-[ST]-x(5)-C-L-0-x-C-x(2) [GA]-P [The two C's are putative 2Fe-2S ligands] 

10 

1 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Frledrich T, Hofhaus G.. Preis D. Eur. J. Biochem. 197:563-576(1991). 

[ 3] Feamley I.M., Walker J,E. Biochim. Biophys. Acta 1140:105-134(1992). 

[ 4] Weidncr U., Geier S., Rock A., Friedrk:h T, Leif H., Weiss H. J. Mol. Bbl. 233:109-122(1993). 

IS 

[0414] 122. copper-bind 

Copper binding proteins, plastocyanin/azurin family 

Number of members: 70 

[041 5] Blue or type-1 ' copper proteins are small proteins which bind a single copper atom and which are character- 
20 ized by an intense electronic absorption band near 600 nm [1 ,2]. The most well known members of this class of proteins 
are the plant chloroptastic plastocyanins, which exchange electrons with cytochrome c6, and the distantly related bac- 
terial azurins. which exchange electrons with cytochrome c551 . This family of proteins also includes all the proteins 
listed bebw (references are only provided for recently determined sequences). 

25 - Amicyanin from bacteria such as Methylobacterium extorquens or Thiobacillus versutus that can grow on rnethyl- 
amine. Amicyanin appears to be an electron receptor for methylamine dehydrogenase. 

- Auracyanins A and B from Chbroflexus aurantiacus [3]. These proteins can donate electrons to cytochrome c-654. 
Blue copper protein from Ak:allgenes faecalis. 

Cupredoxin (CPC) from cucumber peelings [4]. 
30 - Cusacyanin (basic blue protein; plantacyanin, CBP) from cucumber. 

Hakx;yanin from Natrobacterium pharaonis [5], a membrane associated copper-binding protein. 

- Pseudoazurin from Pseudomonas. 

- Ruslicyanin from Thobacilhjs ferrooxkians. Rusticyanin is an electron carrier from cytochrome 0-552 to the a-type 
oxidase [6]. 

3S . Stellacyanin from the Japanese lacquer tree. 
Umecyanin from horseradish roots. 



- Allergen Ra3 from ragweed. This pollen protein is evolutionary related to the above proteins, but seems to have 
lost the ability to bind copper. 

40 

[041 6] Although there is an appreciable amount of divergence in the sequence of all these proteins, the copper ligand 
sites are consen/ed and a pattem whk:h includes two of the ligands (a cysteine and a histidine) has been devek)ped. 

- Consensus pattem: [GAhx(0,2)-[YSAhx(0, 1 )-[VFYl-x-C-x(1 ,2)-[PG]-x(0. 1 )-H-x(2.4)-[h4Q] [C and H are copper lig- 
^ ands] 

[ 1] Garret TRJ., Clingeleffer D.J., Guss J.M.. Rogers S.J., Freeman H.C. J. Bbl. Chem. 259:2822-2625(1984). 
[ 2] Ryden LG.. Hunt LT. J. 1^. Evol. 36:41-66(1993). 

[ 3] McManus J.D„ Brune DC, Han J.. Sanders-Loehr J., Meyer T.E., Cusanovbh M.A., Tollin G., Blankenship R. 
so E. J. Biol. Chem. 267:6531 -6540(1 992). 

( 4] Mann K., Schaefer W., Thoenes U., Messerschmktt A., Mehrabian Z., Nalbandyan R. FEBS Lett. 314:220-223 
(1992). 

[ 5] Mattar S.. Scharf B.. Kent S.B.H.. Rodewald K.. Oesterhelt D.. Engelhard M. J. Bbl. Chem. 269:14939-14945 
(1994). 

ss 1 6] Yano T. Fukumori Y, Yamanaka T. FEBS Lett 288:159-162(1 991 ). 

[0417] 1 23 Chaperonins cpnIO signature 

Chaperonins [1.2] are proteins involved in the foUing of proteins or the assembly of oligomerb protein complexes. 
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They seem to assist other polypeptides in maintaining or assuming conformations which permit their correct assembly 
into oligomeric structures. They are found In abundance in prokaryotes, chloroplasts and mitochondria. Chaperonins 
form oligomeric complexes and are composed of two different types of subunits: a 60 Kd protein, known as cpn60 
(groEL in bacteria) and a 10 Kd protein, known ascpnIO (groES in bacteria).The cpn10 protein binds to cpn60 in the 
s presence of MgATP and suppresses the ATPase activity of the latter. CpnIO is a protein of about 100 amino add 
residues whose sequence is well conserved in bacteria, vertebrate mitochondriaand plants chloroplast [3,4]. CpnIO 
assembles as an heptamer that forms a dome{5]. As a signature pattern for cpnIO, a regnn kx:ated in the N-terminal 
sectbn of the protein was selected. 

Consensus pattern: [U\^l^-x-P4ILTl-x-[DENHKRHUVMFA](3)-(KHEQ]-x(8,9)4SG]-x-[LIV^FY]^^^^ 
10 Note: this pattern is found twk:e in the plant chloroplast protein which consist of the tandem repeat of a cpnIO domain 

[ 1] Ellis R.J.. van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991). 
[ 2] Zeilsta-Ryalls J., Fayet O., Geor^jpoutos C. Annu. Rev. Microbiol. 45:301-325(1991). 
[ 3] Hartman D.J., Hoogenraad N.J.. Condron R., Hoj RB. Proc. Natl. Acad. Sci. U.S.A. 89:3394-3398(1992). 
IS [ 4) Bertsch U., Soli J., Seetharam R.. Viitanen P.V. Proc. Natl. Acad. Sci. U.S.A. 89:8696-8700(1992). 

( 5] Hunt J.F., Weaver A. J.. Landry S.J., Gierasch L. Deisenhofer J. Nature 379:37-45(1996). 

[0418] 124. Chaperonins cpn60 signature (cpn60_TCP1 ) 

Chaperonins (1 ,2] are proteins invoh^ed in the foding of proteins or the assenribly of oligomeric protein complexes. 

20 Their role seems to be to assist other polypeptkJes to maintain or assume conformations whk:h permit their correct 
assembly into otigomerc structures. They are found in abundance in prokaryotes, chloroplasts and mitochondria. Chap- 
eronins form oligomenc connplexes and are composed of two different types of subunits: a 60 Kd protein, known as 
cpnOO (groEL in bacteria) and a 10 Kd protein, known as cpnIO (groES in bactere).The cpn60 protein shows weak 
ATPase activity and is a highly consen/ed protein of about 550 to 580 amino acid reskJues whch has been described 

25 by different names in different species: - Escherichia coli groEL protein , whk:h is essential for the growth of the bacteria 
and the assembly of several bacter»phages. - Cyanobacterial groEL anak>gues. - l^cobacterium tubercutosis and 
leprae 65 Kd antigen, Coxiella burnetii heat shock protein B (gene htpB), Rk:kettsia tsutsugamushi major antigen 58. 
and Chlamydial 57 Kd hypersensitivity antigen (gene hypB). - Chloroplast RuBisCO subunit binding-protein alpha and 
beta chains, whk:h bind ributose bisphosphate carlxjxylase small and large subunits and are implcated in the assembly 

30 of the enzyme oligomer. - Mammalian mitochondrial nratrix protein PI (mitonin or P60). - Yeast HSP60 protein, a 
mitochondrial assembly factor. As a signature pattern for these proteins, a rather well-conserved region of twelve 
resklues, located In the last third of the cpn60sequence was chosen. 
Consensus pattern: A-[AS]-x-[DEQ]-E-x(4)-G-G-[6A]- 

35 [ 1] Ellis R.J., van der Vies S.M. Annu. Rev Biochem. 60:321-347(1991). 

[ 2] Zeilsta-Ryalls J., Fayet O., Georgopoutos C. Annu. Rev Microbol. 45:301-325(1991). 

[0419] Chaperonins TCP-1 signatures (cpn60_TCP1 ) 

The TCP-1 protein [1 ,2] (Tailless Complex PolypeptkJe 1 ) was first identified in mice where it is especially abundant in 
40 testis but present in all cell types. It has since been found and characterized in many other mamnrialian species, in 
Drosophila and in yeast. TCP-1 Is a highly conserved protein of about 60 Kd (556 to 560 residues) which participates 
In a hetero-oligomeric900 Kd double-toms shaped particle [3] with 6 to 8 other different subunits. These subunits, the 
chapeionin containing TCP-1 (CCT) subunit beta, gamma,delta, epsilon, zeta and eta are evolutionary related to TCP- 
1 itself [4,5],The CCT is known to act as a molecular chaperone for tubulin, actin and probably some other proteins. 
45 [0420] The CCT subunits are highly related to archebacterial counterparts: - TF55 and TF56 [6], a molecular chap- 
erone from Sulfotobus shibatae. TF55 has ATPase activity, is known to bind unfolded polypeptides and forms a oligo- 
meric complex of two slacked nine-membered rings. - Themriosome [7], from Thermoplasma ackJophilum. The ther- 
rTK>some is composed of two subunits (alpha and beta) and also seems to be a chaperone with ATPase activity. It 
forms an oligomeric complex of eight-membered rings. The TCP-1 family of proteins are weakly, but significantly [8], 
so related to thecpn6Q/groEL chaperonin family (see <PDOC00268 >). As signature patterns of this family of chaperonins, 
three conserved regkxns located in the N-terminal domain were chosen. 
Consensus pattem: [REELHST]-x-ILMFY]-G-P-x-{GSA]-x-x-K-(LIVMF]{2)- 
Consensuspattem:[U>fl^]-{TS]-{NK]-l>iGAJ-{AVNHK]-[TA\^-[LIVM](2)-x(2)4LIVM]-x4U 
Consensus pattem: CKDEKl-x-x4LIVMGTA]-(GA]-D-G-T- 

55 

( 1] Ellis J. Nature 358:191-192(1992). 

[ 2] Nelson RJ., Craig E.A. Curr. BkA. 2:487-489(1992). 

[ 3] Lewis V.A., Hynes G.M., Zheng D., Saibil H., Willison K.R. Nature 358:249-252(1992). 
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[ 4] Kubota H., Hynes G., Came A., Ashworth A., Willison K.R Curr. Biol. 4:89-99(1994) 
( 5] Kim S.. Willison K.R. Norwich A.L Trends Blochem. Sci. 20:543-548(1994). 
[ 6] Trent J.D.. Nimmesgem E.. W^ll J.S.. HartI RU., Honvich A.L Nature 354:490-493(1991). 
( 7] WaWmannT. Lupas A.. Kellermann J.. Peters J.. BaumeisterW. Biol. Chem. Hoppe-Seyler 376:119-126(1995). 
s 1 8] Hemmingsen S.M. Nature 357:650-650(1 992). 

[0421] 1 25. cyclin (Cycllns) 

The cycllns Include an Intemal duplication, which Is related to that found In TFIIB and the RB protein. 
Ml 

10 l^edline: 94203808 

Evidence for a protein domain superfamily shared by the cycllns. 
TFIIB and RB^p107. 
Gibson TJ, Thompson JD, Blocker A, Kouzarides T; 
Nucleic Acids Res 1994;22:946-952. 
IS [2] 

Medline: 96164440 
The crystal structure of cyclin A 
Brown NR, Noble MEM. Endicott JA, Garman ER W^katsuki S, 
Mitchell E. Rasmussen B. Hunt T. Johnson LN; 
20 Structure. 1 995;3: 1 235-1 247. 

Complex of cyclin and cyclin dependant kinase. 
[3] 

Medline: 96313126 

Structural basis of cyclin-dependant kinase activatk)n by phosphorylation. 
2S Russo AA. Jeffrey PD. Pavletich NP; 
Nat Struct Biol. 1996;3:696-700. 
Cyclins regulate cyclin dependant kinases (CDKs). 

The most divergent prosite members have been included. Swiss:P22674 the UracihDNA glycosylase 2 is the highest 
noise and may be related but has not been included. 
30 N umber of members: 1 89 

[0422] Cycllns [1.2,3] are eukaryotic proteins which play an active role in controlling nuclear cell divisbn cycles. 
Cyclins. together with the p34 (cdc2) or cdk2 kinases, form the Maturatk>n Promoting Factor (MPF). There are two 
main groups of cyclins: 

. G2/M cyclins, essential for the control of the cell cycle at the G2/M (mitosis) transition. G2/M cyclins accumulate 

steadily during G2 and are abruptly destroyed as cells exit from mitosis (at the end of the M-phase). 
- G1/S cyclins. essential for the control of the cell cycle at the G1 /S (start) transitkxi. 

[0423] In most species, there are multiple forms of G1 and G2 cyclins. For example, in vertebrates, there are two 
40 G2 cycllns, A and B, and at least three G1 cyclins, C, D, and E. 

[0424] A cyclin homotog has also been found in herpesvirus saimiri [4]. 

[0^5] The best consented regwn is in the central part of the cyclins' sequences, known as the 'cyclin-box*. From 
this, a 32 residue pattern has been derived. 

45 - Consensus pattern: R-x(2)-(LIVMSAhx(2)-[FYWSHLI\^-x(8)-(LIVMFC]-x(4)-(LIVMFYA]-x(2)-[STAGC]^^^^ 
FYQ]-x-[LIVMFYCHLIVMFYhD-[RKH]-[LlVMFYW| 

[ 1] Nurse P Nature 344:503-508(1990). 
( 2] Norbury C, Nurse R Curr. Bk>l. 1:23-24(1991). 
so [ 3] Lew D.J„ Reed S.I. Trends Cell Biol. 2:77-81 (1 992). 

[ 4] Nicholas J.. Cameron K.R., Honess R.W. Nature 355:362-365(1992). 

[0426] 1 26. Cystatin domain 

This Is a very diverse family. Attempts to define separate subfamilies have failed. Typically, either the N-terminal or C- 
ss terminal end is very divergent But splitting into two domains wouW make very short families. Cathelicidins are related 
to this family but have not been included. Numt>er of members: 1 47 

[0427] Inhibitors of cysteine proteases [1 ,2,3], which are found in the tissues and body fluids of animals, in the lawa 
of the wonm Onchocerca volvulus [4], as well as in plants, can be grouped into three distinct but related families: 
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Type 1 cystatins (or stefins), molecules of about 1 00 amino acid residues with neither disulfide bonds nor carbo- 
hydrate groups. 

Type 2 cystatins, nrK>lecules of about 115 amino acid residues which contain one or two disulfide loops near their 
C-terminus. 

s - KIninogens. which are multifunctional plasma glycoproteins. 

[0428] They are the precursor of the active peptide bradykinin and play a rote in blood coagulation by helping to 
position optimally prekallikrein and factor XI next to factor XII. They are also inhibitors of cysteine proteases. Structurally, 
kininogens are made of three contiguous type-2 cystatrn domains, foltowed by an additional domain (of variable length) 
10 which contains the sequence of bradykinin. The first of the three cystatin domains seems to have tost Its inhibitory 
activity. 

[0429] In all these inhibitors, there Is a conserved regton of five resklues which has been proposed to be important 
for the binding to the cysteine proteases. The consensus pattern starts one residue before this conserved region. 

IS - Consensus patlem: [GSTEQKRV]-Q- [LI\ni-{VAF14SAGQJ-G-x-[LI\^NKl-x(2HLIVMFY]-x-[LIVMFYA]-IDEN- 
QKRHSIV) 

[1] Barrett A.J. Trends Biochem. Scl. 12:193-196(1987). 
[2] Rawlings N.D., Barrett A. J. J. Mol. Evol. 30:60-71(1990). 
20 [3] Turk v., Bode W. FEBS Lett. 285:213-219(1991). 

[4] Lustigman S., Brotman B., Huima T, Prince A.M. Mol. Biochem. Parasitol. 45:65-76(1991). 

[0430] 127. cytochrome.c (Cytochrome c) 
The Ram entry does not include all prosite members. 
2S The cytochrome 556 and cytochrome c* families are not irK:tuded. 
Number of members: 259 

[0431] In proteins betonging to cytochrome c family [1 J, the heme group Is covalently att»:hed by thioether bonds to 
two consen/ed cysteine resklues. The consensus sequence lor this site is Cys-X-X-Cys-His and the histidine residue 
is one of the two axial ligands of the heme iron. This arrangement is shared by all proteins known to belong to cyto- 
30 chrome c family, which presently includes cytochromes c. c\ c1 to c6. c550 to c556, ccS^Hnrrc, cytochrome f and reaction 
center cytochrome c. 

- Consensus pattern: C-{CPWHFHCPWR)-C-H-{CFYW} 

35 [0432] [ 1] Mathews FS. Prog. Btophys. Mol. Biol. 45:1-56(1985). 

[0433] 128. (DAGKa) DIacy (glycerol kinase accessory domain (presumed) 

[0434] Diacylglycerol (DAG) is a second messenger that acts as a prote'm kinase C activator. This domain is assumed 
to be an accessory domain: its function is unknown. 

[0435] [1] Sakane F, Yamada K. Kanoh H, Yokoyama C, Tanabe T, Nature 1990;344:345-348.[2] Sakane F, Imai S, 
40 Kai M, Wada I, Kanoh H, J Bbl Chem 1996;271:8394-8401. [3] Schaap D, de Wktt J, van der Wat J, N^dekerckhove 
J, van, Damme J, Gussow D, Ptoegh HL. van Blitterswijk WJ, van der, Bend RL, FEBS Lett 1990;275:151-158. [4] 
KarK>h H, Yamada K, Sakane F, Trends Biochem Sci 1990;15:47-50. 
[0436] 129. (DAGKc) Diacylglycerol kinase catalytk: domain (presumed) 

[0437] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The catalytic domain 

45 Is assumed from the finding of bacterial honrwtogues, 

[0438] [1] Sakane F, Yamada K, Kanoh H, Yokoyama C, Tanabe T, Nature 1990;344:345-348. [2] Sakane F, Imai S. 
Kai M. Wada I. Kanoh H, J Biol Chem 1996;271:8394-8401. [3] Schaap D. de Widt J, van der W^l J. V^dekerckhove 
J, van, Damme J. Gussow D. Ptoegh HL. van Blitterswijk WJ, van der, Bend RL, FEBS Lett 1990;275:151-158. [4] 
Kanoh H. Yamada K, Sakane F, Trends Bkxhem Sci 1990;15:47-50. 

so [0439] 1 30. D-amino acki oxkiases signature(DAO) 

[0440] D-amino acid oxidase (EC 1.4.3.3 ) (DAMOX or DAO) is an FAD ftavoenzyme that catalyzes the oxidation of 
neutral and basic D-amino acids into their corresponding keto ackte. DAOs have been characterized and sequenced 
in fungi and vertebrates where they are known to be tocated in the peroxisomes. D-aspartate oxidase (EC 1.4.3.1) 
(DASOX) [1] is an enzyme, stmcturally related to DAO, whch catalyzes the same reaction but is active only toward 

ss dicarboxylic D-amIno acids. In DAO, a conserved histidine has been shown [2] to be important for the enzyme's catalytk: 
activity. The consented regton around this restoue has been devetoped as a signature pattern for these enzymes. 
[0441] Consensus pattern: [LIVM](2)-H4NHA]-Y-G-xHGSA](2)-x-G-x(5)-G-x-A [H is a probable active site reskJuejo- 
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[ 1] Negri A.. Ceciliani R. Tedeschi G.. Slmonic T, RonchI S. J, Bbl. Chem. 267:11865-11871(1992). 

[ 2] Miyano M., Fukui K., Watanabe F.. Takahashi S.. Tada M., Kanashiro M., Miyake Y. J. Biochem. 109:171-177 

(1991). 

s [0442] 1 31 . DEAD and DEAH box families ATP-dependent helicases signatures 

A number of eukaryotic and prokaryotk: proteins have been characterized [1 .2,3] on the basis of their structural simi- 
larlly. They all seem to be involved In ATP-dependent, nucleic-acki unwIrKling. Proteins currently known to betong to 
this family are: - Initiatbn factor e!F-4A Found in eukaryotes, this protein is a subunrt of a high molecular weight 
complex involved in 5'cap recogniton and the binding of mRNA to ribosomes. It is an ATP-dependent RNA-helicase. 

10 . PRP5 and PRP28. These yeast proteins are involved in varbus ATP-requiring steps of the pre-mRNA splicing process. 

- PI10, a mouse protein expressed specifk^ally during spermatogenesis. - An3. a Xenopus putative RNA helicase, 
closely related to PI10. - SPP81/DED1 and DBP1. two yeast proteins probably involved in pre-mRNA splicing and 
related to PI10. - Caenorhabditis elegans helicase gIh-1. - MSS116, a yeast protein required for mitochondrial splicing, 

- SPB4, a yeast protein involved in the maturation of 25S rtoosomal RNA. - p68. a human nuclear antigen. p68 has 
IS ATPase and DNA-helicase activities in vitro. It is involved in cell growth and division. - Rm62 (p62). a Drosophlla 

putative RNA helicase related to p68. - DBP2, a yeast protein related to p68. - DHH1 . a yeast protein. - DRS1 . a yeast 
protein involved In ribosome assembly. - MAK5. a yeast protein involved in maintenance of dsRNA killer plasmld. - 
ROK1 , a yeast protein. - ste13, a fission yeast protein. - \fesa, a Drosophila protein important for oocyte formation and 
specificatbn of embryonic posterior stnictures. - Me31 B, a Drosophila matemally expressed protein of unknown f unc- 

20 tbn. - dbpA. an Escherichia coli putative RNA helicase. - deaD, an Escherichia coli putative RNA helbase which can 
suppress a mutatkxi in the rpsB gene for n'bosomal protein S2. - rhIB, an Escherichia coli putative RNA helicase. - 
rhIE, an Escherichia coll putative RNA helicase. - srmB. an Escherichia coll protein that shows RNA-dependent ATPase 
activity. It probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans hypothetical proteins T26G10.1. 
ZK512.2 and ZK686.2. - Yeast hypothetical protein YHR065c. - Yeast hypothetical protein YHR169w - Fission yeast 

2S hypothetical protein SpAC31 A2.07c. - Bacillus subtilis hypothetical protein yxiN. All these proteins share a number of 
consen/ed sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding 
proteins or by proteins bekmging to the helnases 'superfamtl/ [4,E1]. One of these motifs, called the 'D-E-A-D-box', 
represents a special verskxi of the B motif of ATP-binding proteins. Some other proteins betong to a subfamily which 
have His instead of the second Asp and are thus sab to be 'D-E-A-H-box' proteins [3,5,6,E1]. Proteins currently known 

30 to belong to this subfamily are: - PRP2, PRP16, PRP22 and PRP43. These yeast proteins are all involved in various 
ATP-requiring steps of the pre-mRNA splbing process. - Fissbn yeast prhl, which my be involved in pre-mRNA splicing. 

- Male-less (mie), a Drosophila protein required in males, for dosage compensation of X chromosome linked genes. - 
RAD3 from yeast. RAD3 Is a DNA helbase involved in excision repair of DNA damaged by U V light, bulky adducts or 
cross-linking agents. Fission yeast rad15 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2) are the 

35 homofogs of RAD3. - Yeast CHL1 (or CTF1), which is important for chromosome transmissbn and normal cell cycle 
progressbn in G(2)/M. - Yeast TPS1. - Yeast hypothetical protein YKL078w. - Caenorhabditis elegans hypothetical 
proteins COSE 1 . 1 0 and K03H1 . 2. - Poxvimses* early transcriptbn factor 70 Kd subunit which acts with RNA polymerase 
to initiate transcriptbn from early gene promoters. - 18, a putative vaccinia virus helicase. - hrpA, an Escherbhia coll 
putative RNA helbase. Signature patterns for both subfamilies were developed. 

40 [0443] Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN]-x-[LIVMFYGSTN 
Consensus pattem: [GSAH]-x-[UVMF](3)-D-E-{ALiyi-H-[NECR] 

Note: proteins belonging to this family also contain a copy <rf the ATP/GTP- binding motif 'A* (P-loop) (see the relevant 
entry <PDQC00017 

4S [ 1] Schmb S R., Under R Mol. Mbrobbl. 6:283-292(1992). 

[ 2] Under P. LaskoR, Ashbumer M., Leroy P, Nielsen P.J., Nishi K., Schnier J., SlonimskiP.R Nature 337:121 -122 
(1989). 

[ 3] Wassarman D.A.. Steitz J. A. Nature 349:463-464(1991). 
[ 4J Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
so [ 5] Harosh I., Deschavanne P Nucleb Acids Res. 19:6331-6331(1991). 

[ 6] Koonin E.V.. Senkevbh TG. J. Gen. Virol. 73:989-993(1992). 

[0444] 1 32. (DHBP_synthase) 3,4-dihydroxy-2-butanone 4-phosphate synthase 

[0445] 3,4-Dihydroxy-2-butanone 4-phosphate is bbsyntheslzed from ributase 5-phosphate and serves as the bb- 
ss synthetic precursor for the xylene ring of riboflavin. Sometimes found as a bifunctlonal enzyme with GTP cvclohvdro2. 
[0446] Rbhter G, Krieger C, Vblk R, Kis K, Ritz H, Gotze E, Bacher A, Methods Enzymol 1997;280:374-382. 
[0447] 1 33. (DHDPS) Dihydrodipbolinate synthetase signatures 

Dihydrodipicolinate synthetase (EC 4.2. 1.52) (DHDPS) [1] catalyzes, in higher plants chtoroplast and in many bacteria 
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(gene dapA), the first reaction specific to the biosynthesis of lysine and of diaminopimelate. DHDPS is responsible for 
the condensation of aspartate semialdehyde and pyruvate by aping-pong mechanism in which pyruvate first binds to 
the enzyme by forming a Schiff-base with a lysine residue. Three other proteins are structurally related to DHDPS and 
probably also act via a similar catalytic mechanism: - Escherichia coli N^cetylneuraminate lyase (EC 4.1.3.3 ) (gene 
nanA), which catalyzes the condensation of N-acetyl-D-mannosamine and pyruvate to form N-acetylneuraminate. - 
Rhizobium meliloti protein mosA [3], which is involved in the biosynthesis of the rhizopine 3KMnethyl-scylk)-inosamine. 
- Escherichia coli hypothetical protein yjhH. Two signature patterns for these enzymes were developed . The firet one 
is centered on highly consen/ed region in the N-terminal part of these proteins. The second signature contains a lysine 
residue which has been shown, In Escherichia coli dapA [2], to be the one that forms a Schiff-base with the substrate. 
[0448] Consensus pattern: [GSAHUVMHLIVMFYl-x(2)-G-(ST]-[TG]-G-E-[GASNF]-x(6)-[EQ]- 
Consensus pattern: Y^DNSHUVMFA^P-x(2HST]-x(3)-[LIVMG^x(13,14HLIVM]- x-[SGAHLIVMFl-K-[DEQAF]- 
[STAC] [K is involved In Schffl-base formationj- 

( 1] KanekoT, Hashimoto T. Kumpaisal a, Yamada Y. J. Biol. Chem. 265:17451-17455(1990). 

[ 2] Laber B., Gomis-Rueth R-X., Romao M.J.. Ruber R. Biochem. J. 288:691-695(1992). 

[ 3] Murphy PJ.. Trenz S.P, Grzemski W.. de Bruijn RJ., Schell J. J. Bacteriol, 175:5193-5204 (1993). 

[0449] 1 34. (DHOdehase) Dihydroorotate dehydrogenase signatures 

Dihydroorotate dehydrogenase (EC 1.3.3.1) pHOdehase) catalyzes the fourth step in the de novo biosynthesis of 
pyrimidine, the conversk)n of dihydroorotate into ©rotate. DHOdehase is a ubiquitous FAD flavoprotein. In bacteria 
(gene pyrD), DHOdease is located on the inner side of the cytosolic membrane. In some yeasts, such as in Saccha- 
romyces cerevisiae (gene URAI), it is a cytosolk: protein while in other eukaryotes it is found in the mitochondria [1]. 
The sequence of DHOdease is rather well consen/ed and two signature patterns were developed specific to this en- 
zyme. The first corresponds to a region in the N-terminal sectfon of the enzyme while the second is located in the C- 
terminal sectwn and seems to be part of the FAD-binding domain. 

Consensus pattemIGShx(4)-[GKHGSTA]-[LIVRSTA]-[GT]-x(3)-[NQR]-x-G-INHY]-x(2)-P-[RT| 
[0450] Consensus pattem[LIVM](2)-IGSAl-x-G-G-[iy|-x-(STGDN]-x(3HACy|-x(6)-G-A 
[0451] [ 1] Nagy M., Lacroute R. Thomas D. Proc. Natl. Acad. Sci. U.S.A. 89:8966-8970(1992). 
[0452] 135. (DMRL_synthase) 6,7-dimethyl-8-rlbttyIlumazine synthase 
[0453] 1 36. (DNA_methylase) C-5 cytosine-specific DNA methylases signatures 

C-5 cytosine-specific DNA methylases (EC 2. 1.1. 73) (C5 Mtase) are enzymes that specifically methylate the C-5 carbon 
of cytosines in DNA [1,2,3]. Such enzymes are found in the proteins described below. - As a component of type II 
restriction-modificatkxi systems in prokaryotes and some bacteriophages. Such enzymes recognize a specific DNA 
sequence where they methylate a cytoslne. In doing so, they protect DNA from cleavage by type 1 1 restriction enzymes 
that recognize the same sequence. The sequences of a large number of type II C-5 Mtases are known. - In vertebrates, 
there are a number of C-5 Mtases that methylate CpG dinucleotides. The sequence of the mammalian enzyme is 
known.C-5 Mtases share a number of short consented regions. Two of them were selected. The first is centered around 
a consented Pro-Cys dipeptide in whk*i the cysteine has been shown [4] to be involved in the catalytk: mechanism; it 
appears to form a covalent intermediate with the C6 positk)n of cytosine. The second region is kxaXed at the C4ennirial 
extremity in type-ll enzymes 

[0454] Consensus pattern: [DENKS]-x-IFU Vl-x(2HGSTCJ-x-P-C-x(2)-[FYWLIM]-S [C is the active site residuej- 
Consensus pattern: [RKCK3TFl-x(2)-G-N-[STAG]4U VMFl-x(3HLIVMT]-x(3)-ILIVMl-x(3)4 

[ 1] Postei J., Bhagwat A.S., Roberts R.J. Gene 74:261-263(1988). 

[ 2] Kumar S., Cheng X. Ktimasauskas S.. Mi S., Posfai J., Roberts R.J., Wilson G.G. NucIek: Acids Res 22 1-10 
(1994). 

[ 3] Lauster R., Trautner T.A.. Noyer-Weidner M. J. Mol. Biol. 206:305-312(1989). 

[ 4) Chen L, McMillan A.M., Chang W., Ezak-Nipkay K., Lane W.S.. Verdine G.L Biochemistry 30:11018-11025 
(1991). 

[0455] 1 37. (DNAphototyase) DNA photolyases class 2 signatures 

Deoxyribodipyrimkiine photolyase (EC 4.1.99.3) (DNA photolyase) [1 ,2J is a DNArepair enzyme. It binds to UV-dam- 
agedDNAcontainingpyrimtdinedimersand, upon absorbing a near-UV photon (300to500nm), breaks the cyctobutane 
ring joining the two pyrimkfines of the dimer DNA photolyase is an enzyme that requires two choromophore-cofactors 
for its activity: a reduced FADH2 and either 5,10-nTOthenyttetrahydrofolate (5,10-MTFH) or an oxidized 8-hydroxy- 
5-dea2aflavin (8-HDF) derivative (F420). The folate or deazaflavin chromophore appears to function as an antenna, 
while the FADH2 chromophore is thought to be responsible for electron transfer. On the basis of sequence similarities 
[3] DNA photolyases can be grouped into two classes. The second class contains enzymes from Myxococcus xanthus. 
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methanogenic archaebacteria, Insects, fish and marsupial mammals. It is not yet known what second cofactor is bound 
to class 2 enzymes. There are a number of consented sequence regions in all known class 2 DNAphotolyases, espe- 
cially in the C-temiinal part. Two of these regions were selected as signature pattems. 
Consensus pattern: F-x-E-E-x-[LIVM](2)-R-R-E-L-x(2)-N-F- 
Consensus pattem: G-x44-E>-x(2)-W-x-E-R-x-[LIVMl-F-G-K-(UVMhR-[FYhM-N- 

[ 1] Sancar G.B., Sancar A. Trends Biochem. Sci. 12:259-261(1987). 
[ 2] Joms M.S. Biofactors 2:207-211(1990). 

[ 3] Yasui A., Eker A.P.M., Yasuhira S., Yajima H,. Kobayashi T, Takao M., Qkawa A. EMBO J. 13:6143-6151 
(1994). 

[0456] (DNAphotolyase2) DNA phototyases class 1 signatures 

Deoxyribodipyrimidrne photolyase (EC 4.1 .99.3) (DNA photolyase) [1 ,2] is a DNA repair enzyme. It binds to UV-dam- 
aged DNA containing pyrimldlne dimers and .upon absorbing a near-UV photon (300 to 500 nm), breaks the cyctobu- 
tane ring joining the two pyrimidines of the dimer. DNA photolyase is an enzyme that requires two choromophore- 
cofactors for its activity: a reduced FADH2 and either 5.1 O^nethenyltetrahydrofolate (5.10-MTFH) or an oxidized 8-hy- 
droxy-5-deazaflavin (8-HDF) derivative (F420). The folate or deazaflavin chromophore appears to f unctwn as an an- 
tenna, while the FADH2 chromophore is thought to be responsible for electron transfer. On the basis of sequence 
similaritles(31 DNA photolyases can be grouped into two classes. The first class contains enzymes from Gram-ne^tive 
and Gram-positive bacteria, the halophilk; archaebacteria Hatobacterium halobium, fungi and plants. Class 1 enzymes 
bind either 5,10-MTHF (E.coli, fungi, etc.) or 8-HDF (S.griseus, H.hatobium).TTiis family also includes Arabidopsis 
cryptochromes 1 (CRY1) and 2 (CRY2). which are blue light photoreceptors that mediate blue light-Induced gene ex- 
presskxi. There are a number of conserved sequence regons in all known class 1 DNA photolyases, especially in the 
C-terminal part. Two of these regions were selected as signature pattems 
[04S71 Consensus pattem: T-G-x-P-[LIVM](2)-D-A-x-M-[RA]-x-[LIVM]- 
Consensus pattem: [DN]-R-x-R-[LIVM](2)-x-[STA](2)-F-fLIVMFAl-x-K-x-L-x(2,3)- W-[KRQ]- 

[ 1] Sancar G.B., Sancar A. Trends Biochem. Sci. 12:259-261(1987). 
j 2] Joms M.S. Biofactors 2:207-211(1990). 

[ 3] Yasui A.. Eker A.RM., Yasuhira S.. Yajima H.. Kobayashi T, Takao M., CHkawa A. EMBO J. 13:6143-6151 
(1994). 

[ 4] Lin C. Ahmad M.. CashnrK>re A.R. Plant J. 10:893-902(1996). 

[0458] 1 38. (DNA^_A) 

DNA polymerase family A signature 

Replkative DNA polymerases (EC 2.7.7.7) are the key enzymes catalyzing the accurate replwatton of DNA. They 
require either a small RNA molecule or a protein as a primer for the de novo synthesis of a DNA chain. On the basis 
of sequence similarities a number of DNA polymerases have been grouped together {1,2,3] under the designatkxi of 
DNA polymerase family A. The polymerases that belong to this family are listed betow. 

- Escherichia coli and various other bacterial polymerase I (gene polA). 

- Thermus aquaticus Taq polymerase. 

- Bacterkjphage spOl polymerase. 

- Bacterk)phage sp02 polymerase. 
Bacteriophage T5 polynrierase. 
Bacterbphage T7 polymerase. 
Mycobacteriophage L5 polymerase. 

- Yeast mitochondrial polymerase gamma (gene MIP1 ). 

[0459] Five regions of similarity are found in all the above polymerases. One of these conserved regions, known as 
•motif B" [1], is kxated in a domain wh«h, in Escherichia coli polA. has been shown to bind deoxynucleotkle triphosphate 
substrates; it contains a consented tyrosine which has been shown, by photo- affinity labelling, to be in the active site; 
a conserved lysine, also part of this motif, can be chemically labelled, using pyridoxal phosphate. This conserved region 
was used as a signature for this family of DNA polymerases. 

[0460] Consensus pattemR-x(2)-[GSAVI-K-x(3)4LIVMFYHAGQ]-x(2)-Y-x(2)-[GS]-x(3)-[LIVMA] Sequences known 
to bekxig to this class detected by the pattem ALL. 

[ 1] Delarue M.. Poch O.. Todro N., floras D., Argos R Protein Eng. 3:461-467(1990). 
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[ 2] Ito J.. Braithwaite D.K. Nucleic Acids Res. 19:4045-4057(1991). 
[ 3] Braithwaite D.K.. Ito J. Nucleic Acids Res. 21:787-802(1993). 



[0461] 139. DNA^LvlraLC 

DNA potymerase (viral) C-terminal domain 

Number of members: 128 

[0462] 140. (DNAJopoisoll) 

DNAtopoisomerase II signature 

DNA topolsomerase I (EC 5.99.1.2) [1,2,3,4,E1] is one of the two types of enzyme that catalyze the rnterconversion 
of topological DNA isomers. Type II topoisomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topolsomerase II is found in phages, archaebacteria, prokaryotes, eukaryotes, and 
in African Swine Fever vims (ASF). In bacteriophage T4 topolsomerase II consists of three subunits (the product of 
genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyrA and gyrB [E2]). In some bacteria, a second type II topolsomerase has been identified; it is known as 
topolsomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In eukaryotes, type II topolsomerase is a homodimer. 

[0463] There are many regkMis of sequence homok)gy between the different subtypes off topolsomerase II. The 
relation between the different subunits is shown in the following representatbn: 



-AbouM400-residues- 



[ Protein 39-* ][ — Protein 52 — ] Phage T4 

[ gyrB ♦ ][ gyrA ] Prokaryotell 

Archaebacteria 

[ parE * ][ parD- ] ProkaryotelV 

[ * ] Eukaryote and 

ASF 

Position of the pattern. 



[0464] As a signature pattern for this family of proteins, a regfon that contains a highly conserved pentapeptide was 
selected. The pattern is \ocaXe6 in gyrB. In parE. and in protein 39 of phage T4 topolsomerase. 
[0465] Consensus pattem[LI VMA]-x-E-G-[DNl-S-A-x-[STAG) Sequences known to bek)ng to this class detected by 
the pattern ALL. 



[ 1] Stemglanz R. Curr. Opin. Cell Bk>l. 1:533-535(1990). 

[ 2] Bjomsti M.-A. Curr. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Shamra A., Mondragon A. Curr. Opin. Stmct. Biol. 5:39-47(1995). 

[ 4] Roca J. Trends Bkjchem. Sci. 20:156-160(1995). 



[0466] 141 (DSPc) Tyrosine specrfk: protein phosphatases signature and profiles 

Tyrosine specifk: protein phosphatases (EC 3.1.3.48 ^ (PTPase) (1 to 5] are enzymes that catalyze the removal of a 
phosphate group attached to a tyrosine reskJue. These enzymes are very important in the control of cell growth, pro- 
lif eratkxi, differentiation and lransformatk)n. Multiple forms <rf PTPase have been characterized and can be classified 
into two categories: soluble PTPases and transmennbrane receptor proteins that contain PTPase domain(s). The cur- 
rently known PTPases are listed betow: Soluble PTPases. - PTPN1 (PTP-1B). - PTPN2 (T-cell PTPase; TC-PTP). - 
PTPN3 (HI ) and PTPr44 (MEG), enzymes that contain an N-lemriinal band 4. 1 - like domain (see <PDOC00566 >) and 
could act at juncttons between the membrane and cytoskeleton. - PTPN5 (STEP). - PTPN6 (PTP-1C; HCP; SHP) and 
PTPN11 (PTP-2C; SH-PTP3; Syp). enzymes whk:h contain two copies of the SH2 domain at its N4erminal extremity. 
The Drosophlla protein corkscrew (gene csw) also betongs to this subgroup. - PTPN7 (LC-PTP; Henretopoietic protein- 
tyrosine phosphatase; HePTP). - PTPN8 (702-PEP). - PTPN9 (MEG2). - PTPN12 (PTP-G1; PTP-P19). - Yeast PTP1. 
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- Yeast PTP2 which may be involved in the ubiquitinnmediated prcllem degradation pathway. - Fission yeast pypl and 
pyp2 which play a role in inhibiting the onset of mitosis. - Fission yeast pyp3 which contributes to the dephosphorylation 
of cdc2. -Yeast CDC14 which may be involved in chromosome segregation. - Yersinia virulence plasmid PTPAses (gene 
yopH). -Autographacalifomicanuclearpolyhedrosisvims19KdPTPase.DualspecificityP (PTPNIO; 

s MAP kinase phosphatase-1 ; MKP-1); which dephosphorylates MAP kinase on both Thr-183 and Tyr-185. - DUSP2 
(PAC-1), a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr residues. - 
DUSP3 (VHR). - DUSP4 (HVH2). - DUSP5 (HVH3). - DUSP6 (Pysti ; MKP-3). - DUSP7 (Pyst2: MKP-X). - Yeast MSGS, 
a PTPase that dephosphorylates MAP kinase FUS3. - Yeast YVH1. - Vaccinia virus H1 PTPase; a dual specificity 
phosphatase. Receptor PTPases. Structurally, all known receptor PTPases, are made up of a variable length extra- 

10 cellular domain, foltowed by a transmembrane region and a C-termtnateatalytic cytoplasmic domain. Some of the re- 
ceptor PTPases contain fibronectintype III (FN-III) repeats, immunoglobulin-like domains. MAM domains orcarbonic 
anhydrase-like domains in their extracellular region. The cytoplasmic region generally contains two copies of the PT- 
PAse domain. The first seems to have enzyn«tic activity, while the second is inactive but seems to affect substrate 
specificity of the first. In these domains, the catalytic cysteine is generally conserved but some other, presumably 

w important, residues are not. In the f6lk>wing table, the donnain structure of known receptor PTPases is shown: Extra- 
cellular Intracellular Ig FN-3 CAH MAM PTPaseLeukocyte common antigen (LCA) (CEMS) 0 

2 0 0 2Leukocyte antigen related (LAR) 3 8 0 0 2 Drosophfla DLAR 3 9 0 0 2Drosophila DPTP 2 2 0 0 2PTP-alpha 
(LRP) 0 0 0 0 2PTP-beta 016 0 01 PTP-gamma 0 110 2PTPHdelta 0 >7 0 0 2 PTP-epsiton 0 0 0 0 2PTP-kappa 1 4 
0 1 2PTP-mu 1401 2PTP-2eta 0110 2PTPase domains consist of about 300 amino acids. There are two consented 

20 cysteines, the second one has been shown to be absolutely required tor activity. Furthenrore, a number of conserved 
residues in its innmediate vicinity have also been shown to be important. A signature pattem for PTPase domains was 
derived centered on the active site cysteine. There are three profiles for PTPases, the first one spans the complete 
domain and is not specific to any subtype. The second profile is specifk: to dual-specificity PTPases and the third one 
to the PTP subfamily 

^ [0467] Consensus pattem: [LI VMF)-H-C-x(2)-G-x(3HSTCHSTAGP]-x-[LIVMFY] [C Is the active site residue]- 

[ 1] Fischer E.H., Charbonneau K, Tonks N.K. Science 253:401-406(1991). 
[ 2] Charbonneau H., Tonks N.K. Annu. Rev. Cell Bbl. 8:463-493(1992). 
[ 3] TrowbrkJge I.S. J. Biol. Chem. 266:23517-23520(1991). 
30 1 4] Tonks N.K., Charbonneau H. Trends Bkx:hem, Sci. 14:497-500(1989). 

[ 5] Hunter T Cell 58:1013-1016(1989). 

[0468] 1 42. (DUF1 0) Uncharacterized protein family UPF0076 signature 

Thefdtowing uncharacterized proteins have been shown [1] to share regions off similarities: - Goat antigen UK114, a 
3S human homotog and the rat corresponding protein which is known as perchtoric ackJ soluble protein (PSP1 ). PSP1 [2] 
may inhibit an initiation stage of cell-free protein synthesis. - l^use heat-responsive protein HRSP12. - Yeast chro- 
mosome V hypothetical protein YER057c. - Yeast chromosome IX hypothetical protein YILOSIc. - Caenorhabditis sl- 
ogans hypothetical protein C23G10.2. - Escherichia coli hypothetical protein ycdK. - Escherichia coll hypothetical pro- 
tein yhaR. - Escherichia coll hypolhetk»l protein yjgF and HI0719. the corresponding Haemophilus influenzae protein. 
40 - Escherk^iia coli hypothetical protein yoaB. - Bacillus subtilis hypothetk^l protein yabJ. - Haemophilus influenzae 
hypothetk»l protein HI1627. - Helicobacter pylori hypothetical protein HP0944, - Lactococcus lactis aldR. - Myxococcus 
xanthus dfrA - Synechocystis strain PCC 6803 hypothetical protein slr0709. - Rhizobium strain NGR234 symbtotic 
plasmtd hypothetical protein y4sK. - Pyrococcus horikoshii hypothetical protein PH0854.These are small proteins of 
around 15 Kd whose sequence is highly conserved.As a signature pattem, a well conserved region kx:ated in the C- 
45 terminal part of these proteins was selected. 

[0469] Consensus pattem: [PAHASTPVl-R-[SACVF]-x-[LIVMFY]-x(2)-[GSAKR]-x-[LMVA]-x(5,8)-[LIVM]-E-[Mlh 

[ 1] Bairoch A. Unpublished obsenrattons (1995). 

[ 2] Oka T. Tsuji H., Noda C. Sakai K.. Hong Y-M.. Suzuki I.. Munoz S.. Natori Y J. Biol. Chem. 270:30060-30067 
so (1995). 

[0470] 143. (DUF3)Domain of Unknown Function 3 
Domain apparently occurring exclusively in eubacteria. Unknown functk)n. 
[0471] 144. PUF6) Integral membrane protein 
ss [0472] This family includes many hypothetk»l membrane proteins of unknown function. Many of the proteins contain 
two copies of the aligned regkxi. 
[0473] 145. (DUF7) Integral membrane protein 

[0474] This family includes many hypothetk»l membrane proteins of unknown funclkxi. Swiss:P14502 has been 
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implicated in resistance to ethidium bromide. 

[0475] 146. (DapB) DIhydrodiplcolinate reductase signature 

Dihydrodipicolinate reductase (EC 1.3.1.26 ) catalyzes the second step in the biosynthesis of diaminopimelic acid and 
lysine, the NAD or NADP-dependent reduction of 2,3-dlhydrodipicolinate into 2,3,4,5-tetrahydrodipicolinate. This en- 
s zyme is present in bacteria (gene dapB) and higher plants. As a signature pattem the best consented region in this 
enzyme was selected. It is located in the central section and is part of the substrate-binding region [1]. 
[04761 Consensus pattern: E4IV|-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A- 
[0477] [ 1] Scapin G.. Blanchard J.S.. SacchettIni J.C. Biochemistry 34:3502-3512(1 995). 
[0478] 147. DedA family 

10 [0479] This family combines the DedA related proteins and YIAmGIK family. Members of this family are not func- 
tionally characterised. These proteins contain multiple predicted transmembrane regions. 
[0480] 1 48. DegT/DnrJ/EryCI/StrS family 

[0481] The members of this family exhibit some characteristics of the sensor protein of tvwHXjmponenl signal trans- 
duction systems, however none of the members show any sequence similarity to these protein kinases. The members 
IS of this family do have the typical helix-tum-helix motif of DNA binding proteins. 

[0482] [1] Stutznrjan-Engwall KJ. Otten SL, Hutchinson CR, J Bacteriol 1992;174:144-154. 
[0483] 149. (Desaturase) Fatty acid desaturases s^natures 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the insertion of a double bond at the delta position 
of fatty acids. There seems to be two distinct families of fatty acid desaturases which do not seem to be evolutionary 

20 related. Family 1 is composed of : - Stearoyl-CoA desaturase (SCD) (EC 1. 14.99.5) [1]. SCD is a key regulatory enzyme 
of unsaturated fatty ackJ biosynthesis. SCD introduces a cis double bond at the delta(9) position of fatty acyl-CoA's 
such as patmitoleoyi- and oleoyl-CoA. SCD is a membrane-bound enzyme that is thought to functkxi as a part of a 
multlenzyme complex in the endoplasmc reticulum of vertebrates and fungi. As a signature pattem for this family a 
consen/ed region in the C-terminal part of these enzymes was selected, this regbn is rich in histidine residues and in 

25 aromatic residues. Family 2 is composed of: - Plants stearoyl-«cyl-carrier-protein desaturase (EC 1. 14.99.6 ) [2], these 
enzymes catalyze the introduction of a double bond at the delta(9) positk^n of steraoyl-ACP to produce oleoyl-ACP. 
This enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty ackjs in the synthesis of 
vegetable oils. - Cyanobacteria desA [3] an enzyme that can introduce a second cis double bond at the delta(12) 
positron of fatty acid bound to membranes glycerolipids. DesA is involved in chilling tolerance; the phase transition 

30 temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem- 
brane lipids. As a signature pattern for this family a consen/ed regkxi in the C4erminal part of these enzymes was 
selected. 

[0484] Consensus pattem: G-E-x-(FY>H-N-[FY]-H-H-x-F-P-x-D-Y- 

Consensus pattern: [STl-[SA]-x(3)-[QRHLI]-x(5.6)-D-Y-x(2)-[LIVH/IFYW]-{LIVM]-[DE]- 

35 

1 1] Kaestner K.H., fsttambi J.M., Kelly T.J. Jr., Lane M.D. J. Biol. Chem. 264:14755-14761(1989). 
[ 2] Shanklin J.. Somerville C R. Proc. Natl. Acad. Sci. U.S.A 86:2510-2514(1991). 
[ 3] Wada H., Gombos Z., Murata N. Nature 347:200-203(1990). 

40 [0485] 150. Dihydroorotase signatures 

Dihydroorotase (EC 3.5.2.3 ) (DHOase) catalyzes the third step in the de novo biosynthesis of pyrimkJine. the conversion 
of ureidosuccink: ackl (N-carbamoyl-L-aspartate) intodihydroorotate. Dihydroorotase binds a zinc kxi which is required 
for its catalytk: activity [1]. In bacteria. DHOase is a dimer of Identcal chahs of about 400 amino-acid reskJues (gene 
pyrC). In higher eukaryotes. DHOase is part of a large multi-functional protein known as 'rudimentary in Drosophila 

45 and CAD in mammals and which catalyzes the first three steps of pyrimidine biosynthesis [2]. The DHOase domain is 
kxaXed in the central part of this polyprotein. In yeasts, DHOase is encoded by a monof unctional protein (gene URA4). 
However, a defective DHOase domain [3] is found in a multifunctional protein (gene URA2)that catalyzes the first two 
steps of pyrimidine bk>8ynthesis. The connparison of DHOase sequences from various sources shows [4] that there 
are two highly consen/ed regkxis. The first kx:ated in the N-terminal extremity contains two histkJine reskJues suggested 

so [3] to be involved in binding the zinc kxi. The second is found in the C-terminal part. Signature patterns for both regk)ns 
have been devetoped. Allantoinase (EC 3.5.2.5) is the enzyme that hydrolyzes allantoin intoallantoate. In yeast (gene 
DAL1 ) [5], it is the first enzyme in the allanto indegradatkxi pathway; in amphibians [6J and fish it catalyzes the second 
step in the degradatkxi of uric acid. The sequence of allantoinase is evolutkxiary related to that of DHOases. 
[0488] Consensus pattem: D-(LIVMFYWSAPhH-[UVA]-H-[LIVF]-(RNl-x-[PGANF] [The two H's are probable zinc 

ss ligands]- 

Consensus pattem: [GA]-{ST]-D-x-A-P-H-x(4)-K- 

( 1] Brown D.C.. Collins K.D. J. Bbl. Chem. 266:1597-1604(1991). 
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( 2] Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA. Kern C.B. BioEssays 15:157-164(1993). 
[ 3] Souciet J.-L. Nagy M.. Le Gouar M., Lacroute R, Potier S. Gene 79:59-70(1989). 
[ 4] Guyonvarch A., Nguyen-Juilleret M., Hubert J.-C., Lacroute F. Mol. Gen. Genet. 212:134-141(1988). 
[ 5] Buckholz RG., Cooper TG. Yeast 7:913-923(1991). 

( 6] Hayashi S., Jain S., Chu R. Alvares K., Xu B., Erfurth F.. Usuda N., Rao M.S.. Reddy S.K,. Noguchi T, Reddy 
J.K., YekJandi A.Y. J. Biol. Chem. 269:12269-12276(1994). 

[0487] 1 51 . dnaJ donnains signatures and profile 

[0488] The prokaryotic heat shock protein dnaJ interacts with the chaperone hsp70-like dnaK protein [1 ]. Structurally, 
the dnaJ protein consists of an N- temninal consen/ed domain (called 'J' domain) of about 70 amino acids, a glycine- 
rich region ('G* domain*) of about 30 reskJues. a central domain containing four repeats of a CXXCXGXG motif ('CRR 
domain) and a C-termlnal regk>n of 120 to 170 residues. Such a structure is shown in the following schematic repre- 
sentatkxi: 

+ +.^ ^ ^ ^ ^ I N-ierminal 1 1 

Gly-R 1 1 CXXCXGXG | C-temiinal | h +-+ + |. ^ 

[0489] It has been shown [2] that the 'J' donrain as well as the 'CRR domain are also found In other prokaryotkj and 
eukaryotk: proteins whk:h are listed bek>w. 

a) Proteins containing both a 'J' and a 'CRR domain: 

- Yeast protein MAS5/YDJ1 whk:h seems to be involved in mitochondrial protein import. 

- Yeast protein MDJ1 . involved in mitochondrial biogenesis and protein folding. 
Yeast protein SCJ1 . involved in protein sorting. 

Yeast protein XDJ1 . 

Plants dnaJ homotogs (from leek and cucumber). 
Human HDJ2, a dnaJ honrK>k>g of unknown functk>n. 

- Yeast hypothetk:al protein YNL077w. 

a) Proteins containing a 'J'domain without a 'CRR domain: 

Rhizobium fredii nolC, a protein involved in cultivar-speclfic nodulatk>n of soybean. 

- Escherichia coll cbpA [3], a protein that binds cun/ed DNA. 

- Yeast protein SEC63/NPL1 , important for protein assembly Into the endoplasmic retKUlum and the nucleus. 

- Yeast protein SIS1 . required for nuclear migratkm during mitosis. 

- Yeast protein CAJ1 . 

- Yeast hypothetk»l protein YFITO41c. 

- Yeast hypothetbal protein YIR004w. 

- Yeast hypc^etk:al protein YJL162a 

- Plasmodlumfateiparum ring4nfected erythrocyte surface antigen (RESA). RESA, whose functbn is not known, 
is associated with the membrane skeleton of newly invaded erythrocytes. 

- Human HDJ1. 

Human HSJ1, a neuronal protein. 

- Drosophila cysteine-string protein (csp). 

[0490] A signature pattem for the 'J* domain was devetoped, tased on conserved positbns in the C-lermlnal half of 
this domain. A pattem for the 'CRR domain, based on the first two copies of that motif was also developed. A profile 
for the 'J' domain was also developed. 

[0491] Consensus pattern: [FYhx(2)-[U VMA]-x(3)-[FYWHNTI-[DENQSA).x-L-x-[DNhx(3)-[KR]-x(2)-(FYI]- 
Consensus pattem: C- [DEGSTHKR]-x-C-x-G-x-[GK]-[AGSDM]-x(2)-[GSNKR]-x(4.6)-C-x(2.3)-C-x-G-x-G- 

[1] Cyr D.M., Langer T, Douglas M.G. Trends Biochem. Sci. 19:176-181(1994). 

[2] Bork P. Sander C. Valencia A., Bukau B. Trends Bkx:hem. Sci. 17:129-129(1992). 

[3] Ueguchi C, Kaneda M., Yamada H.. Mizuno T. Proc. I^tl. Acad. Sci. U.S.A. 91:1054-1058(1994). 
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[0492] 152. 
[0493] 153. Dwarfin 

[0494] This family known as the dwarfins also includes the drosophila protein MAD. The N4enminus of MAD can 
bind to DN A [2]. 

s [0495] [1] Yingling JM, Das R Savage C. Zhang M, Padgett RW, Wang XF, Proc Natl Acad Scl U S A 1996;93: 
8940-8944. [2] Kim J. Johnson K, Chen HJ. Carroll S. Laughon A, Nature 1997;388:304-308. 
[0496] 154. Dynein light chain type 1 slgr^ature 

Dynetn is a muftisubunrt microtubule-dependent motor enzyme that acts as the force generating protein of eukaryotic 
cilia and flagella. The cytoplasmic isoform of dynein acts as a motor for the intracellular retrograde motility of vesicles 

10 and organelles along microtubules. Dynein is composed of a number of ATP-blnding large subunits, intermediate size 
subunits and small subunits. Among the small subunits, there is a family [1 .2] of highly consented proteins which consist 
of: - Chlamydomonas reinhardtii flagellar outer arm dynein 8 Kd and 1 1 Kd light chains. - Higher eukaryotes cytoplasmic 
dynein light chain 1 .-Yeast cytoplasmic dynein light chain 1 (gene DYN2 or SLC1 ). - Caenorhabditis elegans hypothet- 
ical dynein light chains Ml 8. 2 arKl T26A5. 9These proteins are have from 89 to 1 20 amino acids. As a signature pattern, 

IS A highly consented region was selected. 

Consensus pattern: H-x-l-x-G-[KR]-x-F-{GA]-S-x-V-[ST|-[HY]-E - 

[ 1J King S.M., Patel-King R.S. J. BioL Chem. 270:11445-11452(1995). 

[ 2] Dick T. Ray K.. Sab H.K., Chia W. Mol. Cell. Biol. 16:1966-1977(1996). 

20 

[0497] 155. dUTPase 

[0498] dUTPase hydrolyzes dUTP to dUMP and pyrophosphate. 

[0499] [1] Cedergren-Zeppezauer ES, Larsson G, Nyman PO, Dauter Z, Wilson KS, Nature 1992;355:740-743. [2] 
Mol CD. Harris JM, Mcintosh EM, Talner JA. Structure 1996;4:1077-1092. 

2S [0500] 156. (dCMP cyt deam) Cytidine and deoxycytidylate deaminases zinc-binding region signature 

Cylidine deaminase (EC 3.5.4.5 ) (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into uridine and anrh 
monia while deoxycytidylatedeaminase (EC 3.5.4.12) (dCMP deaminase) hydrolyzes dCMP into dUMP Both enzymes 
are known to bind zinc and to require it for their catalytic activity [1 ,2]. These two enzymes do not share any sequence 
similarity with the exception of a region that contains three consen/ed histldlne and cysteine residues which are thought 

30 to be Involved in the binding of the catalytic zincion. Such a region is also found In other proteins [3,4]: - Yeast cytosine 
deaminase (EC 3.5.4.1 ) (gene FCY1) wh»h transfomns cytosine into uracil. - Mammalian apolipoprotein B mRNA 
editing protein, responsible for the postranscriptkyial editing of a CAA codon Into a UAA (stop) codon in the APOB 
mRNA. - Riboflavin biosynthesis protein ribG, which converts 2,5-diamirK>6- (ribosylamino)-4(3H)-pyrimidinone 5'- 
phosphate into 5namino^(ribosylamlno)-2,4(1H,3H)-pyrimkJinedfone 5'-phosphate. - Bacillus cereus blasticklin-S 

35 deaminase (EC 3.5.4.23) . which catalyzes the deamination of the cytosine moiety of the antibiotics blasticidin S, cy- 
tomycin and acetylblastlcidin S. - Bacillus subtilis protein comEB. This protein is required tor the binding and uptake 
of transfomning DNA. - Bacillus subtilis hypothetical protein yaaJ. - Escherichia coli hypothetfcal protein yfhC. - Yeast 
hypothetk:al protein YJL035c. A signature pattern for this zinc-binding region was derived. 

[0501] Consensus pattern: ICH]-IAGVl-E-x(2)-[U VMFGAT>{LI VMl-x(1 7,33)-P-C-x(2,8)-C-x(3)-[LI VM] [The C's and 
40 H are zinc ligands 

[ 1] Yang C, Cartow D., Wolfenden R., Short S.A. Biochemistry 31:4168-4174(1992). 
[ 2] Moore J.T, Silversmith R.E.. Maley G.F., Mal^ F. J. Biol. Chem. 268:2288-2291 (1993). 
[ 3] Relzer J., Buskirk S.. Bairoch A. Reizer A., Saier M.H. Jr. Protein Scl. 3:853-856(1 994). 
45 [ 4] Bhattacharya S., Navaratnam N., Morrison J.R.. Scott J.. Taylow W.R. Trends Bkx:hem. Sci. 1 9: 1 05-1 06(1 994). 

[0502] 157- Dehydrins signatures 

A number of proteins are produced by plants that experience water-stress. W^ter-stress takes place when the water 
available to a plant falls betow a critkal level. The plant hormone abscisic ackj (ABA) appears to modulate the response 
so of plant to water-stress. Proteins that are expressed during water-stress are called dehydrins [1,2] or LEA group 2 
proteins [3]. The proteins that belong to this family are listed bebw. 

- Arabkiopsis thaliana XERO 1 , XERO 2 (LTI30). RAB18. ERD10 (LTI45) ERD14 and COR47. 

- Bartey dehydrins B8, B9, B17. and Bia 
ss - Cotton LEA protein D-1 1 . 

Craterostigma ptantagineum dessication-related proteins A and B. 

- Maize dehydrin M3 (RAB-17). 

- Pea dehydrins DHN1 , DHN2. and DHNa 
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Radish LEA protein. 

- Rice proteins RAB 16B, 16C, 16D, RAK1. and RAB25. 

- Tomato TAS 14. 

- Wheat dehydrin RAB 1 5 and cold-shock protein cor41 0, cs66 and cs1 20. 

5 

[0503] Dehydrins share a number of structural features. One of the most notable features is the presence, in their 
central region, of a continuous run of five to nine serines followed by a cluster of charged residues. Such a region has 
been found in all known dehydrins so far with the exception of pea dehydrins. A second conserved feature is the 
presence of two copies of alysine-rich octapeptide; the first copy is located just after the cluster of charged residues 
10 that follows the poly-serine region and the second copy is found at the C-terminal extremity. Signature pattems for 
both regions were derived. 

[0504] Consensus pattern: S(5HDE]-x-{DE]-G-x(1 .2)-G-x(0.1 )-(KR](4 
Consensus pattern: [KR]-[LIM]-K-[DE]-K-(UIU]-P-G- 

IS [1] Close TJ., Kortt A.A., Chandler P.M. Plant Mol. Biol. 1 3:95-108(1 g89). 

[2] Robertson M.. Chandler RM. Plant Mol. Biol. 19:1031-1044(1992). 

[3] Dure L III, Crouch M.. Harada J., Ho T-H. D.. Mundy J.. Quatrano R.. Thomas T. Sung Z.R. Plant Mol. Biol. 
12:475-486(1989). 

20 [0505] 1 58. (deoR) Bacterial regulatory proteins, deoR family signature 

The many bacterial transcription regulation proteins which bind DNA through a helix-turn-helix' motif can be classified 
into subfamilies on the basis of sequence similarities. One of these subfamilies groups the following proteins[1,2]: - 
accR, the Agrobacterium tumefaciens plasmid pTlC58 repressor of opine catabolism and conjugal transfer - agaR, 
the Escherichia coli aga operon putative repressor. - deoR, the Escherichia coli deoxyrit>ose operon repressor. - fucR, 

25 the Escherichia coli L-fucose operon activator - gatR, the Escherichia coli galactitol operon repressor. - gIpR, the 
Escherichia coli glycerol-3-phosphate regulon repressor - gutR (or sriR), the Escherichia coli glucitol operon repressor 

- ioIR, from Bacillus subtilis. - lacR, the streptococci lactose phosphotransferase system repressor - spol I ID, the Bacillus 
subtllis transcription regulator of the sigK gene. - yfjR. an Escherichia coli hypothetical protein. - ygbl, an Escherichia 
coli hypothetical protein. - yihW, an Escherichia coli hypothetical protein. - yjfQ, an Escherichia coli hypothetical protein. 

30 - yjhJ, an Escherichia coli hypothetical protein. The 'helix-tum-helix' DNA-binding motif of these proteins is located in 
the N-terminal part of the sequence. Tbe pattem used to detect these proteins starts fourteen residues before the HTH 
motif and ends one residue after it. 

[0506] Consensus pattem: R-x(3)-[LIVM]-x(3)-[LIVMl-x(16,17)-[STA]-x(2)-T-(LIVMA]- [RH]-{KRNA]-D-[UVMF]- 

35 [ 1] von Bodman S., Hayman G.T, Farrand S.K. Proc. Natl. Acad. Sci. U.S.A. 89:643^7(1992). 

[ 2] Bairoch A. Unpublished obsenotions (1993). 

[0507] 159. dsmn 
Double-stranded RNA binding motif 
40 [1 ] Burd CG, Dreyf uss G; Medline: 94310455. Conserved structures and diversity of functions of RNA-binding proteins. 

Science 1994;265:615-621. 

[0508] Sequences gathered for seed by HMM_iterative_training Putative motif shared by proteins that bind to dsRNA. 
At least some DSRM proteins seem to bind to specific RNA targets. Exemplified by Staufen, which is involved in 
localization of at least five different mRNAs in the earty Drosophila embryo. Also by interferon-induced protein kinase 
45 in humans, which is part of the cellular response to dsRNA. 
[0509] Number of members: 116 
[0510] 160. Dynamin family signature 

Dynamin [1,2] is a microtubule^associated force-producing protein of 100 Kd which is involved In the production of 
microtubule bundles and which is able to bind and hydrolyze GTR Dynamin is structurally related to the following 
so proteins: - Drosophila shibire protein (gene shi) [3]. Shibire is, very probably, the Drosophila cognate of mammalian 
dynamin. It seems to provide the motor for vesicular transport during endocytosis. - Yeast vacuolar sorting protein 
VPS1 (or SP015) [4], a protein which could also be involved in microtubule-associated motility. - Yeast protein MGM1 
[5], which is required for mitochondrial genome maintenance. - Yeast protein DNM1 , which is involved in endocytosis. 

- Interferon induced Mx prc^eins [6,7]. interferon alpha or beta induce the synthesis of afamily of closely related proteins. 
ss Most of these proteins are known to confer resistance to influenza viruses and/or rhabdoviruses on transfected mam- 
malian cell in culture. The three motifs found In all GTP-binding proteins are kx^ted in the N-terminal part of these 
proteins. The signature pattem that was devetoped for these proteins is based on a highly conserved region downstream 
of the ATP/GTP-binding motif W (P-k)op) (see <PDOC00017 >).- 
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[0511] Consensus pattern: L-P-IRKl-GHSTNHGNHUVMJ-V-T-R- 

[ 1] VSallee R.B., Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1990). 

[ 2] Obar R.A.. Collins C.A.. Hammart)ack J.A., Shpetner H.S.. VSallee R.B. Nature 347:256-261(1990). 
s [ 3] van der Bliek A., Meyerowitz E.M. Nature 351:411-414(1991). 

[ 4] Rothman J.H.. Raymond C.K., Gilbert T, OHara P.J., Stevens T.H. Cell 61:1063-1074(1990) . 
[ 5] Jones B. A.. Fangman W.L Genes Dev. 6:380-389(1992). 
[ 6] Amherter H., Meier E. New Biol. 2:851-857(1990). 
[ 7) Staeheli P.. Pitossi R, Pavtovic J. Trends Cell Biol. 3:268-272(1993). 

[051 2] 161. (dynamin_2) Dynamin central region 

[05131 "f^is tegm lies between the GTPase domain, see dynamin . and the pleckstrin homology (PH) domain. 
[0514] 162. E1-E2 ATPases phosphorylation site 

E1-E2 ATPases (also known as P-type) are catwn transport ATPases whbh form an aspartyl phosphate intermediate 
in the course of ATP hydrolysis. ATPases which belong to this family are listed betow [1 ,2,3]. - Fungal and plant plasma 
membrane (H+) ATPases [reviewed in 4]. - Vertebrate (Na+. K+) ATPases (sodium pump) [reviewed In 5,6].-Gastrk; 
(K+, H+) ATPases (proton pump). - Cateium (Ca++) ATPases (cateium pump) from the sarcoplasmk: retfculum (SR), 
the endoplasmic retfculum (ER) and the plasma membrane. - Copper (Cu++) ATPases (copper pump) whch are in- 
volved in two human genetk: disorders: Menkes syndrome and Wilson disease [7]. - Bacterial potassium (K-i-) ATPases. 
- Bacterial cadmium efflux (Cd-M^) ATPases (reviewed in 8]. - Bacterial magnesium (Mg++) ATPases. - A probable 
cation ATPase from Leishmania. - fixl, a probable cation ATPase from Rhizobium melik>ti. involved in nitrogen flxatk>n. 
The regfon around the phosphorylated aspartate residue is perfectly consented in all these ATPases and can be used 
as a signature pattern. 

[051 5] Consensus pattern: D-K-T-G-T-[LI]-[TI] [D is phosphorylated] 

[ 1] Green N.M., McLennan D.H. Biochem. Soc. Trans. 17:819-822(1989). 
[ 2] Green N.M. Bk)chem. Soc. Trans. 17:970-972(1989). 
[ 3] Fagan M.J.; Saier M.H. Jr. J. Mol. Evol. 38:57-99(1994). 
( 4] Serrano R. Bk)chim. Biophys. Acta 947:1-28(1988). 
[ 5] Fambrough D.M. Trends Neurosci. 11:325-328(1988). 
[ 6] Sweadner K.J. Bkxhim. Bk)phys. Acta 988:185-220(1989). 
[ 7] Bull PC, Cox D.W. Trends Genet. 10:246-251(1994). 

[ 8] Silver S.. Nucifora G., Chu L. Misra T.K. Trends Biochem. Sci. 14:76-80(1989). 

3S [05161 163 E1_N 

El Protein, N terminal domain 
Number of members: 90 

[0517] 164. (Eljdehydrog) Dehydrogenase El component 

[OStei This family uses thiamine pyrophosphate as a cofactor. This family includes pynjvate dehydrogenase, 2-ox- 
40 oglutarate dehydrogenase and 2-oxoisovalerate dehydrogenase. 
[0519] 165. (ECH) Enoyl-CoA hydratase/isomerase signature 

Enoyl-CoA hydratase (EC 4.2.1.17 ) (ECH) [1] and 3-2trans-enoyK;oA isomerase(EC 5.3.3.8 ) (ECl) [2] are two en- 
zymes involved in fatty add metabolism. ECH catalyzes the hydratation of 2-trans-enoyl-CoA into 3-hydroxyacyl-CoA 
and ECl shifts the 3- double bond of the intermediates of unsaturated fatty acid oxidation to the 2-trans positkxi. Most 

45 eukaryotK cells have two fatty-acid betaoxktetcn systems, one located in mitochondria and the other in peroxisomes. 
In mitochondria, ECH and ECl are separate yet structurally related monofunctional enzymes. Peroxisomes contain a 
trifunctkxial enzyme [3] consisting of an N-lerminal domain that bears both ECH and ECl activity, and a C4erminal 
domain responsible for 3-hydroxyacy)-CoA dehydrogenase (HCDH) activity. In Escherichia coli (gene fadB) and Pseu- 
domonas tragi (gene faoA), ECH and ECl are also part of a multifunctfonal enzyme whfch contains both a HCDH and 

so a3-hydroxybutyryl-CoA epimerase domain [4]. A number of other proteins have been found to be evolutionary related 
to the ECH/ECI enzymes or domains: - 3-hydroxbutyryl-coa dehydratase (EC 4.2.1.55) (crotonase). a bacterial enzyme 
involved in the butyrate/butanol-producing pathway. - Naphthoate synthase (EC 4.1.3.36 ) (DHNA synthetase) (gene 
menB) [5], a bacterial enzyme involved in the biosynthesis of menaquinone (vitamin K2). DHNA synthetase converts 
O-succinyl-benzoyl-CoA (OSB-CoA) to 1.4-dihydroxy- 2-naphthoic acid (DHNA). - 4-chk>robenzoate dehalogenase 

55 (EC 3.8.1.6 ) [6], a Pseudomonas enzyme which catalyzes the converskxi of 4-chlorobenzoate-CoA to 4-hydroxyben- 
zoate-CoA. - A Rhodobacter capsulatus protein of unknown f unctbn (ORF257) [7]. - Bacillus subtilis putative polyketide 
biosynthesis proteins pksH artd pksl. - Escherchia coli carnitine racemase (gene caiD) [8]. - Escherichia coli hypothet- 
ical protein ygfG. - Yeast hypothetical protein YDR036c. As a signature pattern for these enzymes, a consented regfon 
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richin glycine and hydrophobic residues was selected. 

[0520] Consensus pattern: [UVAfl4STA].x-[UVM]4DENQRHSTAhGTX(3HAG](3)-x(4)4LI VMST]-x-{CSTAJ4DQHPl- 
[UVMFY]- 

5 [ 1J Minamhlshii N.. Taketani S., Osumi T, Hashimoto T Eur. J. Biochem. 185:73-78(1989). 

[ 2] Mueller-Newen G.. Stoffel W. Biol. Chem. Hoppe-Seyler 372:613-624(1 991). 

( 3] Palosaari P.M.. Hiltunen J.K. J. Biol. Chem. 265:2446-2449(1990). 

( 4] Nakahigashi K.. Inokuchi H. Nuciek: Acids Res. 18:4937-4937(1990). 

[ 5] Driscol! J R., Taber H.W. J. Bacteriol. 174:5(^5071(1992). 
10 [6] Babbitt RC, Kenyon G.L., Matin B.M.. Charest H., Sylvestre M.. Schollen J.D., Chang K.-H., Liang P-H.. 

Dunaway-Mariano D. Bkx:hemtstry 31:5594-5604(1992). 

[ 7] Beckman D.L.. Kranz R.G. Gene 107:171-172(1991). 

[ 8] Efchler K., Bourgis R, Buchet A.. Kleber H.-P. Mandrand-Berthelot M.-A. Mol. Mfcrobiol. 13:775-786(1994). 

IS [0521] 166. (EF1BD) Etongation factor 1 beta/beta'/delta chain signatures 

Eukaryotic elongatksn factor 1 (EF-1) is responsible for the GTP-dependent binding of aminoacyl-tRNAs to the rlbos- 
omes [1]. EF-1 is composed of four subunits: the alpha chain which binds GTP and aminoacyMRNAs, the gamma 
chain that probably plays a role in anchoring the complex to other cellular components and the beta and delta (or beta) 
chains. The beta and delta chains are highly similar proteins that both stimulate the exchange of GDP bound to the 

20 alpha chain for GTP [2]. The beta and delta chains are hydrophilic proteins of around 23 to 31 Kd. Their C-terminal 
part seems important for the nucleotkie exchange activity, while the N-terminal sectbn is probably involved in the 
interactton with the gamma chain. Two signature patterns for this family of proteins were devetoped. The first corre- 
sponds to an acidic regkxi in the central sectkxi; the second, to the C^erminal extremity of these proteins 
[0522] Consensus pattern: [DEJ-1DEGHDE](2)-[LIVMFJ-D-L-F-G- 

2S Consensus pattem: [IV]-Q-S-x-D-IUVM]-x-A4FWMJ-{NQJ-K-[LIVM]- 

[ 1] Rlls B., Rattan I.S., Clark B.F.C.. Merrck W.C. Trends Bkx:hem. Scl. 15:420-424(1990). 

[ 2] van Damme H.T.F., Amons R.. Karssies R., Tlmmers C. J., Janssen G.M.C.. Moeller W. Biochim. Biophys. Acta 

1050:241-247(1990). 

30 

[0523] 167. (EF1G_domain) Ekyigation factor 1 gamma, consented domain 
[0524] 168. (EFG.C) EkxYgatk>n factor G C-tenminus 

[0525] This family is always found associated with GTP EFTU. This family includes the carboxyl tenminal regrans 
of Ek)ngation factor G, etongation factor 2 and some tetracycline resistance proteins. 
3S [0526] 169. (EFP) Elongatkxi factor P signature 

Etongatton factor P (EF-P) [1] is a prokaryotw protein translatton factor required for effrcient peptide bond synthesis 
on 70S ribosomes from fMet4RNAfMet. EF-P is a protein of 21 Kd. It is evolutkxiary related to yeiP, an hypothetical 
protein from Escherk:hia cdi. As a signature pattem, a conserved regton kx^ated in the C-terminal part of these proteins 
was selected. 

40 [0527] Consensus pattem: K-x-IAV|-x(4)-G-x(2)-[LIV|-x-V-P-x(2)-[LIV]-x(2)-G- 
[ 1] Aoki H.. Adams S.-L, Turner M.A.. Ganoza M.C. Bkx:himie 79:7-11(1997). 
[0528] 1 70. (EF TS) Etongatton factor Ts signatures 

In prokaryotes etongatksn factor Ts (EF-Ts) is a component <rf the ekxigatkxi cycle of protein biosynthesis. It associates 
with the EF-Tu.GDP complex and induces the exchange of GDP to GTP, it remains bound to the aminoacyl-tRNA. EF- 
45 Tu.GTP complex up to the GTP hydrolysis stage on the ribosome [1].EF-Ts is also a component of the chloroplast 
protein biosynthetk; machinery and is encoded in the genome of some algal chtoroplast [2]. It is also present in mito- 
chondria [3]. As signature patterns for EF-Ts. two consented regkins tocated in the N-tenminal part of the protein have 
been selected. 

[0520] Consensus pattem: L-R-x(2)-T-[GSDNQl-x-{GSHLI VMF]-x(0, 1 )-[DENKAC]-x-K-[KRNEQS]-A-L- 
so [053G] Consensus pattem: E-^LIVM}-[Nyl-[SCV]-[QE]-T-D-F-V-[SAHKRN^ 

[ 1] Bubunenko M.G., Kireeva M.L, Gudkov A.T Biochimie 74:419-425(1992). 
[ 2] Kostrzewa M.. Zetsche K. Plant Mol. Btol. 23:67-76(1993). 

[ 3] Xin K. Woriax V.L, Burkharl W.A., Spremulli LL. J. Bidi. Chem. 270:17243-17249(19951 . 

55 

[0531] 171. (EMP24_GP25L) enrtp24/gp25Up24 family 

[0532] Members off this family are implicated in bringing cargo forward from the ER and binding to coat proteins by 
their cytoplasmb domains. Number of members: 30 
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[0533] Paccaud JR Thomas DY. Bergeron JJ. Nilsson T, J Cell Biol 1 998; 1 40:751 -765. 

172. ENVjX)lyprotein 

ENV polyprotein (coat potyprotein) 

Number of members: 224 

s [0534] 173. (ERG4_ERG24) Ergosterol biosynthesis ERG4/ERG24 family signatures 

Two fungal enzymes involved in ergosterol biosynthesis arKi which act by reducing double bonds in precursors of 
ergosterol have been shown to be evolutionary related [1]. These are C-14 sterol reductase (gene ERG24 In budding 
yeast and ergS in Neurospora Crassa) and C-24(28) sterol reductase (gene ERG4 in budding yeast and stsi in fission 
yeast). Their sequences are also highly related to that of chicken lamin B receptor, which is thought to anchor the 

10 lamina to the inner nuclear nnembrane. These proteins are highly hydrophobic and seem to contain seven or eight 
transmembrane regions. As signature patterns, two consented regions were selected. The first one is apparently lo- 
cated in a loop between the fourth and fifth transmembrane regions and the second is in the C-terminal section. 
[0535] Consensus pattern: G-x(2HLI VM]-[YHl-D-x-IFYW]-x-G-x(2)-L-N-P-R- 
Consensus pattem: [LIVM]{2)-H-R-x(2)-R-D-x(3)-C-x(2)-K-Y-G- 

is [ 1] Lai M.H., Bard M., Pierson C.A., Alexander J.R. Goebl M., Carter G.T, Kirsch D.R. Gene 140:41-49(1994). 
[0536] 174. (ERM) Ezrin/radixin/moesin family 

[0537] This family of proteins contain a band 4. 1 domain (Band_41) , at their amino terminus. This family represents 
the rest of these proteins. 

[0538] [1] Yonemura S. Hirao M, Doi Y, Takahashi N, Kondo T. Tsukita S, J Cell Biol 1998;140:885-895. 

20 [0539] 175. ER lumen protein retaining receptor signatures 

Proteins that reside in the lumen of the endoplasmk: reticulum (ER) contain aC4erminal tetrapeptlde (generally K-D- 
E-L or H-D-E-L) that serves as a signal for their retrieval (retrograde transport) from subsequent compartments of the 
secretory pathway. The signal is recognized by a receptor molecule that is believed to cycle between the cis side of 
the Golgi apparatus and the ER [IJ.This protein is known as the ER lumen protein retaining receptor or also as the 

2S 'KDEL receptor*. It has been characterized in a variety of species, including fungi (gene ERD2). plants, PlasnrKxJium, 
Drosophila and mammals. In mammals two highly related forms of the receptor are known. Structurally, the receptor 
is a protein of about 220 reskiues that seems to contain seven transmembrane regkms [2]. The N4erminal part (3 
residues) is oriented toward the lumen while the C-lerminal tail (about 12 residues) is cytoplasmic. There are three 
lumenal and three cytoplasmte kx)ps. Two signature pattems for these receptors were developed. The first pattem 

30 corresponds to the C-terminal half of the first cytoplasmic loop as well as most of the second transmembrane domain. 
The second pattern is a perfectly consented decapeptide that corresponds to the central part of the fifth transmembrane 
domain. 

[0540] Consensus pattem: G-l-S-x-[KR]-x-Q-x-L-[FY]-x-[LIV](2)-F-x(2)-R-Y- 
Consensus pattem: L-E-[SA1-V-A-I-[LM]-P-Q-L- 

35 

1 1] Pelham H.R.B. Curr. Opin. Cell Bk)l. 3:585-591(1991). 

[ 2] Townsley F.M.. Wilson D.W.. Pelham H.R.B. EMBO J. 12:2821-2829(1993). 

[0541] 176. (ETF_beta) Electron transfer flavoprotein beta-subuntt signature 

40 The electron transfer flavoprotein (ETF) [1 ,2] serves as a specific electron acceptor for varraus mitochondrial dehydro- 
genases. ETF transfers electrons to the main respiratory chain via ETF-ubiqulnone oxkioreductase. ETF is an het- 
erodimer that consist of an alpha and a beta subunit and which bind one molecule of FAD per dimer. A similar system 
also exists in some bacteria. The beta subunit of ETF is a protein of about 28 Kd which is structurally related to the 
bacterial nitrogen fixatbn protein fixA whk:h could play a role in a redox process and feed electrons to f erredoxin . Other 

4S related proteins are: - Escherbhia coli hypothetical protein ydiQ. - Escherkihia coli hypothetical protein ygcR.As a 
signature pattem for these proteins, a conserved regwn which is kx:ated In the central section was selected. 
[0542] Consensus pattem: (I VA]-x-[KFrhx(2}-{DE]- [GDHGDEhx(1 ,2)-[EQ]-x-[LI V]- x(4)-P-x-[LI VMl(2)-rTAC]- 

[ 1] Finocchiaro G., Ikeda Y, Ito M., Tanaka K. Prog. Clin. Bnl. Res. 321:637-652(1990). 
so [ 2] Tsai M.H., Saier M.H. Jr. Res. Microbk>l. 146:397-404(1995). 

[0543] 177. Endonuclease III signatures 

Escherkrfiia coli endonuclease 111 (EC 4.2.99.18) (gene nth) [1] is a DNA repair enzyme that acts both as a DNA N- 
glycosylase, removing oxktized pyrimWines from DNA, and as an apurinic/apyrimkJInIc (AP) endonuclease, Introducing 
ss a single-strand nick at the site from whk:h the dannaged base was removed. Endonuclease III Is an Iron-sulfur protein 
that binds a single 4Fe-4Scluster. The 4Fe-4S cluster does not seem to be important for catalytk; activity, but is probably 
involved in the proper positioning of the enzyme abng the DNA strand [2].Endonuclease III is evolutionary related to 
the folbwing proteins: - Fission yeast endonuclease III homolog (gene nthi) [3]. - Escherichia coli and related protein 
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DNA repair protein mutY. which is an adenine glycosyJase. MutY is a larger protein (350 amino acids) than endonuclease 
III (21 1 amino acids). - Micrococcus luteus ultraviolet N-glycosylase/AP lyase which initiates repair at cis-syn pyrimidine 
dimers. - ORF10 in plasmid pFVI of the thermophilic archaebacteria Methanot)acterium thermoformicicum [4]. Restric- 
tion methylase m.MthTI, which is encoded by this plasmid, generates 5-methylcytosine which is subject to deamlr«tion 

5 resufting in G-T mismatches. This protein could correct these mismatches. - Yeast hypothetical protein YALOISc. - 
Fission yeast hypothetical protein SpAC26A3.02. - Caenorhabditis elegans hypothetical protein R10E4.5. - Methano- 
coccus jannaschii hypothetical protein MJ0613.The 4Fe-4S cluster is bound by four cysteines which are all located In 
a 17amino acid region at the C-terminal end of endonuclease III. A similar region Is also present In the central section 
of mutY and in the C-terminus of ORFIOand of the Micrococcus UV endonuclease. The 4Fe-4S cluster region does 

10 not exist In YALOISc. Two signature patterns for these proteins were developed: the first corresponds to the core of 
the iron-sulfur binding domain, the second corresponds to the best conserved region in the catalytic core of these 
enzymes. 

[0544] Consensus pattern: C-x(3)-IKRS]-P-{KRAGL]-C-x(2)-C-x(5)-C [The four C's are 4Fe-4S ligands]- 
Consensus pattern: [GSTl-x4LIVMF]-P-x(5)-[LIVMVVhx(2.3)4LIHPAShG-V4GAh^^^^ 
IS [UVMFYWHGANKJ- 



[ 1] Kuo C.-F., McRee D.. Fisher C.L, aHandley S.R, Cunnigham R.R. Tainer J.A. Science 258:434-440(1992). 
[ 2] Thomson A. J. Curr. Biol. 3:173-174(1993). 

[ 3] Roldan-Arjona T, Anselmino C. Lindahl T. Nucleic Acids. Res. 3307-3312(1996). 
20 1 4] Noelling J., van Eeden F.J.M., Eggen RLL. de Vbs W.M. Nucleic Acids Res. 20:6501-6507(1992). 

[0545] 1 78. (Epimerase) NAD dependent eptmerase/dehydratase family 

[0546] This family of proteins utilize NAD as a cofactor. The proteins in this family use nucleotide-sugar substrates 
for a variety of chemical reactions. 
2S [0547] [1] Thoden JB, Hegeman AD, Wesenberg G. Chapeau MC, Frey PA. HoWen HM, Biochemistry 1997 36 

6294-6304. 

[0548] 179. Exonuclease 

[0549] This family includes a variety of exonuclease proteins, such as ribonuclease T and the epsilon subunit of DNA 
polymerase III. 

30 [0550] ( 1 ] Koonin E V. Deutscher MP, Nucleic Acids Res 1 993;21 :2521 -2522. 
[0551] 180. ENTH 
ENTH domain 

[0552] [1] Kay BK. Yamabhai M, Wendland B, Emr SD; Medline: 99156083. Identification of a novel domain shared 
by putative components of the endocytic and cytoskeletal machinery. Protein Scl 1999;8:435-438, 
35 [0553] The ENTH (Epsin N-terminal honrwiogy) domain is found in proteins involved in endocytosis and cytoskeletal 
machinery. The function of the ENTH domain is unknown. 
[0554] Number of members: 29 

[0555] 181 . (elF-1 A) Eukaryotic initiation factor 1 A signature 

Eukaryotk: translation initiation factor 1 A (elF-1 A) [1] (fomierly known aseiF-4C) is a protein that seems to be required 
40 for maxima! rate of protein bkjsynthesis. It enhances ribosome dissociation Into subunits and stabilizesthe binding of 
the initiator Met-tRNA to 40S ribosomal subunits.elF-1 A is a hydrophillc protein of about 15 to 17 Kd. Archaebacteria 
also seem to possess a elF-IA homotog. As a signature pattern, a consen/ed regbn in the central sectbn of these 
proteins was selected. 

[0556] Consensus pattern: [IMl-x-G-x-[GS]-[KRH]-x(4)-(CLhx-D-G-x(2)-R-x(2)-[RH]-l- x-G 
45 [0557] [ 1] Wei C.-L, Kainuma M.. Hershey J.W.B. J. Biol. Chem. 270: 22788-22794(1 995V 
[0558] 1 82. (elF-5A) Eukaryotic initiation factor 5A hypusine signature 

Eukaryotic initiatk)n factor 5A (elF-SA) (formerly known as elF-4D) [1,2] is a small protein whose precise role in the 
initiation of protein synthesis is not known. It appears to promote the formation of the first peptide bond. elF^Aseems 
to be the only eukaryotk: protein to contain an hypusine reskiue. Hypusine is derived from lysine by the post4ranslational 
so additkxi of a butylamino group (from spemiidine) to the epsiton-amino group of lysine. The hypusine group is essential 
to the function of elF-5A. A hypusine-contalning protein has been found in archaebacteria such as Sulfotobus acido- 
caldarlus or Methanococcus jannaschii; this protein Is highlysimilar to elF^A and could play a similar role in protein 
' biosynthesis. The signature developed for elF-5A is centered around the hypusine reskJue. 
[0559] Consensus pattern: pT]-G-K-H-G-x-A-K [The first K is modified to hypusine] 

ss 

1 1] Park M.H., Wolff E.C., Folk J.E. Biofactors 4:95-104(1993). 

[ 2] Schnier J., Schwelberger H.G.. Smit-McBride 2.. Kang H.A.. Hershey J.W.B. Mol. Cell. Bk>l. 11:3105-3114 
(1991). 
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[0560] 183. (efhand) S-100/ICaBP type caJcium binding protein signature 

S-100 are snnall dimeric acidic calcium and zinc-binding proteins [1] abundant in the brain. They have two different 
types of calcium-binding sites: a low affinity one with a special structure and a 'normal' EF-hand type high affinity site. 
The vitamin-D dependent intestinal calcium-binding proteins (ICaBP or calbindin 9 Kd) also belong to this family of 
proteins, but it does not form dimers. In the past years the sequences of many new members of this family have been 
determined (for reviews see [2,3,4]); in most cases the function of these proteins Is not yet known, although it is be- 
coming clearthat they are involved in cell growth and differentiation, cell cycle regulation and metabolic control. These 
proteins are: - 

Calcycirn (Prolactin receptor associated protein (PRA); clatropin; 2a9; 5B10; S100A6). - Calpactin I light chain (p10; 
p11; 42c; S100A10). - Calgranulin A (cystic fibrosis antigen (CFAg); MIF related protein 8 (MRP- 8); p8; S100A8). - 
Calgranulin B (MIF related protein 14 (MRP-14); p14; S100A9). - Calgranulin C. - Calgizzarin (S100C). - Placental 
calcium-binding protein (CAPL) (18a2; peL98: 42a; p9K; MTS1; metastatin; S100A4). - Protein S-100D (S100A5). - 
Protein S-100E (S100A3). - Protein S-100L (CAN19; S100A2). - Placental protein S-100P (S100E). - Psoriasin 
(S100A7). - ChenrK)tactic cytokine CP-10 [5]. - Protein MRP-126 [6]. - Trichohyalin [7]. This is a large intermediate 
filament-associated protein that associates with keratin intermediate filaments (KIF); it contains a S- 100 type domain 
in its N-terminal extremity. A number of these proteins are known to bind cateium while others are not (pi Ofor example). 
Our EF-hand detecting pattern wilt fail to pk:k those proteins which have k)st their cateium-binding properties. A pattern 
was devetetped which unambiguously pwks up proteins befonging to this family. This pattem spans the region of the 
EF-hand high affinity site but makes no assumptions on the cateium-binding properties of this site. 
[0561] Consensus pattem: [UVMFYWl(2)-x(2)-[LKl-C>-x(3)-IDNhx(3)-[DNSGHFY]-x- [ES]-[FYVC]-x(2)-IUVMFS]- 
[LIVMF] 

[ 1] Baudier J. (In) Calcium and Cateium Binding proteins. Gerday C. Bollis L. Giller R., Eds., pp102-113, Springer 
Verlag, Berlin. (1986). 

[ 2] Moncrief N.D,, Kretsinger R.H., Goodman M. J. Mol. Evol. 30:522-562(1990). 
[ 3] Kligman D., Hilt D.C. Trends Biochem. Sci. 13:437-443(1988). 

[ 4] Schaefer B.W.. Wk:ki R., Engelkamp D.. Mattel M.-G., Heizmann C.W. Genomes 25:638-643(1995). 
[ 5] Lackmann M.. Cornish C.J.. Simpson R.J.. Moritz R.L., Geczy C.L J. Biol. Chem. 267:7499-7504(1992). 
[ 6] Nakano T, Graf T. Oncogene 7:527-534(1992). 

[ 7] Lee S.-C., Kim l.-G.. Marekov LN., aKeefe E.J., Parry D.A.D.. Steinert RM., J. Btol. Chem. 268:12164-12176 

(1993). 

EF-hand calcium-binding domain 

Many catoium-binding proteins bekxig to the same evolutionary family and share a type of calcium-binding domain 
known as the EF-harKi [1 to 5]. This type of domain consists of a twelve residue loop flanked on both side by a twelve 
residue alpha-helcal domain. In an EF-hand loop the cateium ton is coordinated in a pentagonal bipyramidal configu- 
ration. The six residues involved in the binding are in positbns 1. 3, 5, 7. 9 and 12; these reskiues are denoted by X. 
Y, Z. -Y. -X and -Z The invariant Gtu or Asp at position 12 provides two oxygens for liganding Ca (bidentate ligand). 
Listed below are the proteins whk:h are known to contain EF-hand regions. For each type of protein the total number 
of EF-hand regkxis known or supposed to exist is indk:ated between parenthesis. This number does not include regk>ns 
which clearly have k>st their cateiunrhbinding properties, or the atypical tow-affinity site (whk:h spans thirteen residues) 
found in the S-10Q/ 
ICaBP family of proteins [6]. 

Aequorin and Renilla luciferin binding protein (LBP) (Ca=3). 

- Alpha actinin (Ca=2). - Calbindin (Ca=4). 

Cateineurin B subunit (protein phosphatase 2B regulatory subunit) (Ca=4). 

- Cateium-binding protein from Streptomyces erythraeus (Ca=3?). 

- Cateium-binding protein from Schistosoma mansoni (Ca=2?). 

- Cateium-binding proteins TCBP-23 and TCBP-25 from Tetrahymena thermophila (Ca=4?). - Cateium-dependent 
protein kinases (CDPK) from plants (Ca=4). 

Cateium vector protein from amphoxius (Ca=2). 

- Cateyphosin (thyroid protein p24) (Ca=4?). 

- Calmodulin (C3=4, except in yeast where Ca=3). 

- Caipain small and large chains (Ca=2). - Calretinin (Ca=:6). 
Cateyclin (prolactin receptor associated protein) (Ca=2). 

- Caltractin (centrin) (Ca=2 or 4). 

- Cell Division Control protein 31 (gene CDC31 ) from yeast (Ca=2?). 
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- Diacylglycerol kinase (EC 2.7.1 .107) (DGK) (Ca=2). 

- FAD-dependent gIycerol-3-phosphate dehydrogenase (EC 1.1.99.5) from mammals (Ca=1). - Flmbrin (plastin) 
(Ca=2). 

Flagellar calcium-binding protein (1f8) from Trypanosoma cruzi (Ca=1 or 2). 
5 - Guanylate cyclase activating protein (GCAP) (Ca=3). 

- Inositol phospholipid-specific phospholipase C isozymes gamma-1 and delta-1 (Ca=2) [10]. - Intestinal calcium- 
binding protein (ICaBPs) (Ca=2). 

- MIF related proteins 8 (MRP^ or CFAG) and 14 {MRP-14) (Ca=2), 

- Myosin regulatory light chains (Ca=1 ). - Oncomodulin (Ca=2). 

10 - Osteonectin (basement membrane protein BM-40) (SPARC) and proteins that contains an 'osteonectin' domain 
(QR1 . matrix glycoprotein SCI) (see the entry <PDOC00535>) (Ca=1). - Paralbumins alpha and beta (Ca=2). 

- Placental calciunrvbinding protein (18a2) (nerve growth factor induced protein 42a) (p9k) (Ca=2). 

- Recoverins (visinin, hippocalcin, neurocalcin, S-nrwdulin) (Ca=2 to 3). 

- Retrcubcalbin (Ca=4). - S-100 protein, alpha and beta chains (Ca=2). 
IS - Sarcoplasmic calcium-binding protein (SCPs) (Ca=2 to 3). 

- Sea urchin proteins Spec 1 (Ca=4). Spec 2 (Ca=4?). Lps-1 (Ca=8). 

- Serine/threonine protein phosphatase rdgc (EC 3.1.3.16) from Drosophila (Ca=2) - Sorcin VI 9 from hamster 
(Ca=2). - Spectrin a^ha chain (Ca=2). 

- Squidulin (optic bbe calcium-binding protein) from squid (Ca=4). 

20 - Troponins C; from skeletal muscle (Ca=4), from cardiac muscle (Ca=3). from arthropods and molluscs (Ca=:2). 

There has been a number of attempts [7,8] to develop patterns that pick-up EF-hand regions, but these studies were 
made a few years ago when not so many different families of cateium-blnding proteins were known. Therefore a new 
pattern was devetoped whfch takes into account all published sequences. This pattern includes the complete EF-hand 
2S loop as well as the first reskiue whch follows the k>op and which seem to always be hydrophobic. 

- Consensus pattern: D-x-(DNS]-{ILVFYWHDENSTGHDI^GHRK]-{GPHI-IVMC]-[DENQSTAGC]-x(2)-[DE]- 
tLIVMFYWl 

- Note: positwns 1 (X), 3 (Y) and 12 (-Z) are the most conserved. 

30 - Note: the 6th residue in an EF-hand kxjp is, in most cases a Gly, but the number of exceptions to this 'rule' has 
gradually increased and therefore the pattern should include all the different residues which have been shown to 
exist In this position in functksnal Ca-binding sites. 

- Note: the pattem will, in some cases, miss one of the EF-hand regbns in some proteins with multiple EF-hand 
domains. 

3S 

1 1] Kawasaki H.. Kretsinger R.H. Protein Prof. 2:305-490(1995).[2] Kretsinger R.H. CoW Spring Harbor Symp. 
Quant. Bk>l. 52:499-510(1987). 

[ 3J N4oncrief N.D., Kretsinger RH.. Goodman M. J. Mol. Evol. 30:522-562(1990). 
[ 4] Nakayama S., Moncrief N.D.. Kretsinger R.H. J. Mol. Evol. 34:416-448(1992). 
40 [ 5] Heizmann C.W.. Hunziker W. Trends Biochem. Sci. 16:98-103(1991). 

[ 6] Kligman D., Hilt D.C. Trends Biochem. Sci. 13:437-443(1988). 
[ 7] Strynadka N.C.J.. James M.N.G. Annu. Rev. Biochem. 58:951-98(1989). 
[ 6] Haiech J., Satlantin J. Biochimie 67:555-560(1985). 

[ 9] Chauvaux S., Beguin P, Auberl J.-P, Bhal K.M., Gow LA.. Wood TM., Bairoch A. Biochem. J. 265:261-265 
45 (1990). 

[10] Bairoch A.» Cox J.A. FEBS Lett. 269:454-456(1990). 

[0562] 184. Fnolase signature 

Enolase (EC 4.2.1.11) is a glycolytk: enzyme that catalyzes the dehydratran of2-phospho-D-glycerate to phosphoe- 
so nolpyruvate [1]. It is a dimeric enzyme that requires magnesium both for catalysis and stabilizing the dimen Enolase 
is probably found in all organisms that nrietabolize sugars. In vertebrates, there are three different tissue-specific iso- 
zymes: alpha present in most tissues, beta in muscles and gamma found only in nervous tissues. Tau-crystallin, one 
of the major lens proteins in some fish, reptiles and birds, has been shown (2J to be evolutionary related to enolase. 
As a signature pattem for enolase, the best conserved region was selected, it is located In the C^ermtnal third of the 
ss sequence.- 

[0563] Consensus pattem: [UV](3)-K-x-N-Q-l-G-[ST]-[LI V1-[ST]-[DE]-[STA] 
[ 1] Lebioda L. Stec B., Brewer J.M. J. Biol. Chem. 264:3685-3693(1989). 
[ 2] Wistow G., Piattigorsky J. Science 236:1554-1556(1987). 
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[0564] 185. (F-actin_cap_A) F-actin capping protein alpha subunit signatures 

The F-acttn capping protein binds In a calcium-Independent manner to the fast growing ends of actin filaments (barbed 
end) thereby blocking the ^change of subunlts at these ends. Unlike gelsolin and severin this protein does not sever 
actin filaments. The F-actnn capping protein Is a heterodimer composed of two unrelated subunits: alpha and beta.The 
s alpha subunit is a protein of about 268 to 286 amino acid residues whose sequence is well conserved in eukaryotk: 
species [1]. As signature patterns two highly conserved regbns in the C-terminal section of the alpha subunit were 
selected. 

[0565] Consensus pattern: V+I-(FY](2)-E-D-G-N-V 
Consensus pattern: F-K-{AE]-L-R-R-x-L-P- 
10 [0566] [ 1] Cooper J A, CaMwell J.E.. Gattermeir D.J., Torres M.A., Amatnida J.F.. Caselia J.F Cell Motll. Cytoskel- 
eton 18:204-214(1991). 
[0567] 186. F-box domain 

[0568] 11] Bal C. Sen R Hofmann K. Ma L, Goebl M. Harper JW. Elledge SJ, Cell 1 996;86:263-274. (2] Skowvyra D. 
Craig KL. Tyers M, Elledge SJ. Harper JW. Cell 1997:91:209-219. 
IS [0569] 187. F-protein 

Negative factor, (F Protein) or Nef. 

[0570] [1 J Arold S. Franken R Slmb M-R Hoh F. Benichou S, Benarous R, Dumas C; Medline: 98035457, The crystal 
structure of HI V-1 Nef protein bound to the Fyn kinase SH3 domain suggests a role for this complex in altered T cell 
receptor signalling Structure 1997;5:1361-1372. 
20 [0571] Nef protein accelerates virulent progression of AIDS by its interaction with cellular prc^eins involved in signal 
transduction and host cell activation. Nef has been shown to bind specifically to a subset of the Src kinase family 
[0572] N umber of members: 1013 
[0573] 188. (FAD_bindingL2) 

Fumarate reductase / succinate dehydrogenase FAD-binding site In bacteria two distinct, membrane^jound. enzyme 
25 complexes are responsible for the interconversion of fumarate and succinate (EC 1 .3.99.1 ): fumarate reductase (Frd) 
is used in anaerobe growth, and succinate dehydrogenase (Sdh) is used in aerobe growth. Both complexes consist 
of two main components: a membrane-extrinsic component composed of a FAD-btndIng flavoprotebi and an iron-sulfur 
protein; and an hydrophobe component composed of a membrane anchor protein and/or a cytochrome B. 
[0574] In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) (EC 1 .3,5.1 ) is an enzyme composed of 
30 two subunits: a FAD flavoprotein and and iron-sulfur protein. 

[0575] The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is covalently bound to a histkJine 
residue which is kx^ted in the N-tenminal sectkxi of the protein [1 ]. The sequence around that histidlne is well conserved 
in Frd and Sdh from varkxis bacterial and eukaryotic species [2] and can be used as a signature pattern. 
(057q Consensus patlemFHST]-H-IST|-x(2)-A-x-G-G (H Is the FAD binding site] Sequences known to belong to this 
^ class detected by the pattern ALL. 

[ 1] Blaut M.. Whittaker K., >fekJovlnos A., Ackrell B.A.. Gunsalus R.R. Cecchini G. J. Bk>l. Chem. 264:13599-13604 
(1989). 

[ 2] BIrch-Machin M.A.. Famsworth L. Ackrell B.A.. Cochran B., Jackson S.. Bindoff L.A., Aitken A., Diamond A. 
40 G., Tumbull D.M. J. B»l. Chem. 267:11553-11558(1992). 

[0577] 1 89. Fatty acki desaturases signatures (FA_desaturase) 

Fatty acid desaturases (EC 1.14.99.-) are enzymes that catalyze the Insertkxi of a double bond at the delta posltkxi 
of fatty ackis. There seems to be two distinct families of fatty acid desaturases which do not seem to be evolutkxiary 

^ related. Family 1 1scomposedof: - Stearoyl-CoAdesaturase (SCD) (EC 1.14.99.5) [1]. SCD isa key regulatory enzyme 
of unsaturated fatly ackJ bbsynthesis. SCD introduces a cis double bond at the delta(9) position of fatty acyl-CoA's 
such as palmitoleoyi- and oleoyl-CoA. SCD is a membrane-bound enzyme that is thought to functbn as a part of a 
multienzyme complex In the erKtoplasmic reteulum of vertebrates and fungi. As a signature pattem for this family a 
consen/ed regwn in the C-temilnal part of these enzymes was selected, this regbn Is rich in hIstkJine residues and In 

so aromate residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein desaturase (EC 1.14.99.6 ) [2], these 
enzymes catalyze the introductkxi of a double bond at the delta(9) positkxi of steraoyl-ACP to produce oleoyl-ACP. 
This enzyme is responsible for the conversion of saturated fatty acids to unsaturated fatty acids in the synthesis of 
vegetable oils. - Cyanobacteria desA [3] an enzyme that can Introduce a second cis double bond at the della(12) 
posltton of fatty acki bound to membranes glycerollpids. DesA Is involved in chilling tolerance; the phase transition 

ss temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem- 
brane lipids. As a signature pattem for this family a consen/ed region in the C-terminal part of these enzymes was 
selected. 

[0578] Consensus pattem: G-E-x-(FYl-H-N-{FY]-H-H-x-F-P-x-D-Y- 
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Consensus pattern: [ST>[SA]-x(3)-[QRHU]-x{5.6)-D-Y-x(2)-[LIVMF\W]4LIVM]-[DE]- 

[ 1] Kaestner K.H.. Ntambi J.M., Kelly T.J. Jr., Lane M.D. J. Biol. Chem. 264:14755-14761(1989), 
[ 2] Shanklin J., Somerville C.R. Proc. Natl. Acad. ScL U.S.A. 88:2510-2514(1991). 
5 [ 3] Wada H., Gombos Z., Murata N. Nature 347:200-203(1 990). 

[0579] 190. Fructose-1-6-btsphosphatase active site (FBPase) 

Fructose-1,6-bisphosphatase (EC 3.1.3.11) (FBPase) [1], a regulatory enzyme in glaconeogenesis, catalyzes the hy- 
drolysis of f nictose 1 .6-bisphosphate to fructose 6-phosphate. It is Involved in niany different metabolic pathways and 

10 found in most organisms. Sedoheptulose-1,7-bisphosphatase (EC 3. 1.3 37) (SBPase) [2] is an enzyme found plant 
chloroplast and in photosynthetic bacteria that catalyzes the hydrolysis of sedoheptulose 1 .7-bisphosphate to sedo- 
heptulose 7-phosphate, a step in the Calvin's reductive pentose phosphate cycle. It is functionally and structurally 
related to FBPase. In mammalian FBPase, a lysine residue has been shown to be involved in the catalytic mechanism 
[3]. The region around this residue is highly consented and can be used as a signature pattern for FBPase and SBPase. 

IS It must be noted that, in some bacterial FBPase sequences, the active site lysine is replaced by an arginine 
Consensus pattern: [AGJ-[HK]-L-x(1.2)-[UVHFYI-E-x(2)-P-lLIVM]-[GSA] [K/R is the active site residue]- 

[ 1] BenkDvic S.J., DeMaine M M. Adv. Enzymol. 53:45-82(1982). 

[ 2] Raines C.A.. Lloyd J.C.. Willingham N.M., Potts S.. Dyer T.A. Eur. J. Biochem. 205:1053-1059(1992). 
20 [ 3J Ke H.. Thorpe CM., Seaton B.A., Lipscomb W.N., Marcus F. J. Mol. Biol. 212:513-539(1989). 

[0580] 1 91 . FGG Y family of carbohydrate kinases signatures * 

It has been shown [1] that four different type of carbohydrate kinases seem to be evolutkxiary related. These enzymes 
are: - L-fucotokinase (EC 2.7.1.51) (gene fucK). - Gluconokinase (EC 2.7.1.12 ) (gene gntK). - Glycerokinase (EC 

2S 2.7.1.30) (gene gIpK). - Xylulokinase (EC 2.7.1.17) (gene xylB). - L-xyluk>se kinase (EC 2.7. 1.53) (gene lyxK).These 
enzymes are proteins of from 480 to 520 amino acid residues. As consensus patterns for this family of kinases two 
consen/ed regkmswere selected, one in the central section, the other in the C-terminal section. 
[0581] Consensus pattern: [MFYGS^x-[PST]-x(2)-K^LIVMFYWl-x-W-[LIVMF]-x-[DENQTKR]- [ENQH]- 
Consensus pattern: IGSA]-x-[LIVMFYW|-x-G-[LIVMhx(7.8)-[HDENQ]-[LIVMF]-x(2)-[AS]-[STAIVMHLIVMFY]-[DE^^ 

30 [0582] [ 1] Reizer A., Deutscher J., Saier M.H. Jr.. Reizer J. Mol. Mrcrobiol. 5:1081-1089(1991). 
[0583] 192. FKBP-type peptidyl-protyl cis-trans isomerase signatures/profile (FKBP) 

FKBP [1 ,2,3] is the major high-affinity binding protein, in vertebrates, for the immunosuppressive drug FK506. It exhibits 
peptidyl-prolyl cis-trans isomerase activity (EC 5.2.1.8) (PPIase or rotannase). PPIase is an enzyme that accelerates 
protein folding by catalyzing the cis-trans isomerizatkjn of proline imidic peptide bonds in oligopeptides (4].Al least 

3S three different forms of FKBP are known in mammalian species: - FKBP-12, which is cytosolk: and inhibited by both 
FK506 and rapamycin. - FKBP-1 3, whkti is membrane associated and inhibited by both FK506 and rapamycin. - FKBP- 
25, which is preferentially inhibited by rapamycin. These forms of FKBP are evolutionary related and show extensive 
^ similarities[5,6,7] with the folk>wing proteins: - Fungal FKBP - Mammalian hsp binding immunophilin (HBI) (also called 
^ p59). HBI is a protein which bffids to hsp90 and contains two FKBP-like domains in its N- terminal sectk)n - the first of 

40 whk;h seenr^ to be f unctbnal. - The C-terminal part of the cell-surface protein mip from Legionella; a protein associated 
with macrophage infection by an unknown mechanism. - Escherichia coli slyO [8], a protein with a N-terminal FKBP 
domain foltowed by an histkJine-rich metal-binding domain. - Escherichia coli fkjsA. - Escherichia coli fklB (FKBP22). 
- Escherichia coli sIpA. - Bacterial trigger factor (Tig). - Streptomyces hygroscopus and chrysomallus FK506-binding 
prcAein. - Chlamydia trachomatis 27 Kd membrane protein. - Neisseria meningitidis strain C1 14 PPiase. - Probable 

4S PPiases from HaenrK)philus influenzae (HI0754). Methanococcus jannaschii (MJ0278 and MJ0825), Pseuctomonas 
fluorescens and PseudonrK>nase aeruginosa. Two signature patterns for these proteins were developed. One is based 
on a consen/ed region in the N-terminus of FKBP, the other is located in the central sectfon. The profile for FKBP spans 
the complete domain. 

[0584] Consensus pattem: IUVMCJ-x4YFJ-x-[GVL]-x(1 .2)-CLFT|-x(2)-G-x(3)-[DEl-[STAEQKHSTAN]. 
so [0585] Consensus pattem: [UVMFY]-x(2)^GAJ-x(3.4)-[LIVMF]-x(2)-[LIVMFHKl-x(2)-G- x(4)-[LIVMF]-x(3)-[PS- 
GAQ]-x(2)-IAGHFYI-G- 

[ 1] Tropschug M., Wlachter E.. Mayer S., Schoenbrunner E.R., Schmkl F.X. Nature 346:674-677(1990). 
[ 2] Stein R.L Curr. B»l. 1:234-236(1991). 
ss 1 3] Siekierka J. J., Wkierrecht G., Greulk:h H., Boulton D., Hung S.H.Y. Cryan J., Hodges P.J.. Sigal N.H. J. Bk)l. 

Chem. 265:21011-21015(1990). 

[ 4] Fischer G., Schmid FX Biochemistry 29:2205-2212(1990). 

[ 5] Trandinh C.C., Pao G.M., Saier M.H. Jr. FASEB J. 6:3410-3420(1 992). 
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[ 6] Gatal A. Eur. J. Blochem. 216:689-707(1993). 

1 7] Hacker J.. Fischer G. MoL Microbiol. 10:445456(1993). 

[8] Wuetfing C, Lomardero J.. Plueckthun A. J. Biol. Chem. 269:2895-2901(1994). 

5 [0586] 1 93. MAPEG family (aka: FLAP/GST2^TG4S family signature) 
[0587] The following mammalian proteins are evolutionary related [1 ]: 

- Leukolriene C4 synthase (EC 2.5.1 .37) (gene LTC4S), an enzyme that catalyzes the productk>n of LTC4 from LTA4. 

- Microsomal glutathione S-transferase II (EC 2.5.1 .18) (GST-II) (gene GST2), an enzyme that can also produces 
10 LTC4 f ron LTA4. 

- 5-lipoxygenase activating protein (gene FLAP), a protein that seems to be required for the activatk>n of 5-llpoxy- 
genase. 

[0588] These are proteins of 1 50 to 1 60 residues that contain three transmembrane segments. As a signature pattem, 
IS a conserved region between the first and second transmembrane domains was selected. 
[0589] Consensus pattem: 6-x(3)-F-E-R-V-[FY]-x-A4NQ]-x-N-C 

[05901 [1] Jakobsson P. -J.. Mancini J.A„ Ford-Hutchinson A.W, J. Biol. Chem. 271:22203-22210(1996). 
[0591] 1 94. FMN-dependent alpha-hydroxy ackJ dehydrogenases active site (FMN_dh) 

A number of oxidoreductases that act on alpha-hydroxy acids and which are FMN -containing flavoproteins have been 
20 shown [1 ,2,3] to be structurally related; these enzymes are: - Lactate dehydrogenase (EC 1.1.2.3 ). which consists of 
a dehydrogenase domain and a heme-binding domain called cytochrome b2 and which catalyzes the conversion of 
lactate into pymvate. - Glycolate oxidase (EC 1.1.3.15) ((S)-2-hydroxy-acid oxkJase), a peroxisomal enzyme that cat- 
alyzes the conversion of glycolate arKJ oxygen to glyoxylate and hydrogen peroxide: - Long chain alpha-hydroxy ackl 
oxkJase from rat (EC 1.1.3.15) . a peroxisomal enzyme. - Lactate 2-monooxygenase (EC 1.13.12.4 ) (lactate oxidase) 
2S from Mycobacterium smegmatis, which catalyzes the conversion of lactate and oxygen to acetate, cartxxi dioxide and 
water. - (S)-mandelate dehydrogenase from Pseudomonas putida (gene mdlB), which catalyzes the reduction of (S)- 
mandelate to benzoylformate. The first step in the reaction mechanism of these enzymes is the abstractk)n of the 
proton from the alpha-carbon of the substrate producing a cart>anion which can subsequently attach to the N5 atom 
of FMN. A conserved htstidine has been shown [4] to be involved in the removal of the proton. The region around this 
30 active site reskJue is highly consen/ed and contains an arginine residue which Is involved in substrate binding. 
[0592] Consensus pattem: S-N-H-G-IAG]-R-Q [H is the active site residue] [R is a substrate-binding residue]- 

[ 1} Giegel D.A.. Williams C.H. Jr.. Massey V. J. Biol. Chem. 265:6626-6632(1990). 

[ 2] Tsou A.Y., Ransom S.C., Gerit J.A., Buechter D.D., Babbitt RC, Kenyon G.L Biochemistry 29:9856-9862 
35 (1990). 

[ 3] Le K.H.D., Lederer R J. Biol. Chem. 266:20877-20880(1991). 
[ 4] LIndqvist Y. Branden C.-l. J. Biol. Chem. 264:3624-3628(1989). 

[0593] 195. Flavin-blndtng monooxygenase-like (FMO-like) 
40 [0594] This family includes FMO proteins, cyclohexanone monooxygenase 
[0595] 196. (FPGS) 

Folylpolyglutamate synthase signatures (aka Murjigase) 

[0596] Folylpolyglutamate synthase (EC 6.3.2. 1 7) (FPGS) [1 ] is the enzyme of folate metabolism that catalyzes ATP- 
dependent addition of glutamate moieties to tetrahydrofolate. 
45 [0597] Its sequence is moderately consen/ed between prokaryoles (gene folC) and eukaryotes. V\te developed two 
signature pattems based on the coriserved regions which are rich in glycine residues and could play a role in the 
catalytical activity and/or in substrate binding. 

[0598] Consensus pattem [LIVMFYl-x-[LIVMJ-ISTAGl-G-T-[NK]-G-K-x-[ST]-x(7)- [UVM](2)-x(3)-[GSKl Sequences 
known to bekxig to this class detected by the pattem ALL 
so [05991 Consensus pattem[UVMFYJ(2)-E-x-G-[LIVMHGA]-G-x(2)-D-x-[GST|-x-(LIVM](2) Sequences known to be- 
long to this class detected by the pattem ALL 

[0600] [ 1] Shane B.. GarrowT, Brenner A., Chen L, Choi YJ.. Hsu J.C.. Stover R Adv. Exp. Med. Biol. 338:629-634 
(1993). 

[0601] 197. FYVE zinc finger 
ss [0602] The FYVE zinc finger is named after four proteins that it has been found in: Fabi , YOTB/ZK632. 1 2, Vac^ , 
and EEA1. The FYVE finger has been shown to bind two Zn++ ions [1]. The FYVE finger has eight potential zinc 
coordinating cysteine positions. Many members of this family also include two histidines in a motif R+HHC+XCG. where 
+ represents a charged residue and X any residue. Members were included which do not consen^e these histidlne 
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residues but are clearly related. 

[0603] [1] Sterwnark H. Aasland R, Toh BH, D'Arrlgo A. J Bbt Chem 1996;271:24048-24054. [2] Gaullier JM. Sl- 
monsen A. D'Arrigo A, Bremnes B. Stenmark H. Aasland R, Nature 1998;394:432-433. 
[0604] 198. F_actin_cap_B 
5 F-actin capping protein beta subunit signature 

[0605] The F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filaments 
(barbed end) thereby bkxking the exchange of subunits at these ends. Unlike gelsolin and severin this protein does 
not sever actin filaments. The F-actin capping protein is a heterodimer composed of two unrelated subunits: alpha and 
beta. 

10 [0606] The beta subunit is a protein of about 280 amino acid residues whose sequence is well consented in eukaryot ic 
species [1]. As a signature pattern a conserved hexapeptide in the N-terminal sectk>n of the beta subunit was selected. 
[0607] Consensus pattern: C-D-Y-N-R-D Sequences known to belong to this dass detected by the pattern ALL 
[0608] [1] Amatmda J.F, Cannon J.R, Tatchell K., Hug C, Cooper J.A Nature 344:352-354(1990). 
[0609] 199. Isopencillln N synthetase signatures (Fe_Asc_oxldored) 

IS Isopenicillin N synthetase (IPNS) [1 ,2] is a key enzyme in the biosynthesis of penicillin and cephalosporin. In the pres- 
ence of oxygen, it removes iron arKJ ascort>ate. four hydrogen atoms from L-(alpha-amtnoadipyl)-L-cysteinyl-d-valine 
to form the azetkiinone and thiazolkJine rings of isopenk:illin. IPNS is an enzyme of about 330 amtno-acki residues. 
Two cysteines are consented in fungal and bacteria! IPNS sequences; these may be involved in iron^inding and/or 
substrate-binding. Cephatosporium acremonium DAOCS/DACS [3] e a bifunctional enzyme involved in cephalosporin 

20 biosynthesis. The DAOCS domain, which is structurally related to IPNS, catalyzes the step from pencillin N to deac- 
etoxy-cephak>sporin C - used as a substrate by DACS to form deacetyk:ephak3sporin C. Streptomycesclavutigerus 
possesses a monofunctional DAOCS enzyme (gene cefE) [4] also related to tPNS. Two signature patterns for these 
enzymes were derived, centered around the consented cysteine resklues. 
[0610] Consensus pattern: [RK]-x-{STA]-x(2)-S-x-C-Y-ISL]- 

2S Consensus pattern: (UVMl(2)-x-C-G-[STA)-x(2)-[STAG]-x(2)-T-x-[DNG]- 

[1] Martin J.F. Trends Bbtechnot. 5:306-308(1987). 

[ 2] Chen G.. Shiftman D., Mevarech M.. Aharonowitz Y Trends Bbtechnol. 8:105-111(1990). 
[ 3] Samson S.M., Dotzlaf J.E., Slisz M.L., Becker G.W., van Frank R.M., Veal LE.. Yeh W.K., Miller J R., Queener 
30 S.W., Ingolia T.D. BkVTechnotogy 5:1207-1214(1987). 

[ 4] Kovacevic S., Weigel B.J., Tobin M.B.. Ingolia T.D., Miller J.R. J. Bacterbl. 171:754-760(1989). 

[0611] 200. Fibritlarin signature 

Fibrillarin [1] is a component of a nucleolar small nuclear ribonucleoprotein(SnRNP) particle thought to participate in 
3S the first step off the processing of pre-rRNA. In mammals, fibrillarin is associated with the U3. U8 and U 1 3sma!l nuclear 
RNAs [2]. Fibrillarin is an extremely well conserved protein of about 320 amino ackJ residues. Structurally it consists 
off three different domains: - An t^terminal domain of about 80 amino ackis whbh is very rich in glycine and contains 
a number off dimethylated arginine resklues (DMA). - A central domain of about 90 reskJues whk^ resembles that of 
RNA-b^idlng proteins and contains an octameric sequence similar to the RNP-2 consensus found in such proteins. - 
40 A C-terminal alpha-helcal domain. A protein evolutbnary related to fibrillarin has been found [3] in archaebacteria 
such as Methanococcus vannielii or voltae. This protein (geneflpA) is involved in pre-rRNA processing. It lacks the 
Gly/Arg-rich N-termlnal domain. As a signature pattem, a regton was selected that starts with and encompases theRNP- 
2 like octapeptide sequence. 

[0612] Consensus pattern: [GST]4LIVMAP]-V-Y-A-[IVI-E-(FY)-[SA>x-R-x(2)-R-[DEJ- 

45 

[ 1] Aris J.P. Blobel G. Proc. Natl. Acad. Sci. U.S.A. 88:931-935(1991). 

[ 2] Bandziulis R.J., Swanson M.S., Dreyfuss G. Genes Dev 3:431-437(1989). 

[ 3] Agha-Amiri K. J. Bacterk)!. 176:2124-2127(1994). 

so [0613] 201. Filamin/ABP280 repeat 

[0614] [1] Fucini P, Renner C, Herberhold C, Noegel AA. Holak TA. Nat Struct Biol 1997;4:223-230. 
[061 5] 202. Fucosyi transferase 

[0616] This family of Fucosyltransferases are the enzymes transferring fucose from GDP-Fucose to GlcNAc in an 
atphal 3 linkage [1]. 
ss [0617] [1] Breton C. Oriol R. Imberty A; Glycobiotogy 1998;8:87-94. 

[0618] 203. 2Fe-2S ferredoxins, iron-sulfur binding regk>n signature (fer2A) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolk: reac- 
tions. Fenedoxins can be divkled into several subgroups depending upon the physk>k>gk:al nature of the iron sulfur 
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cluster(s) and according to sequence similarittes. One d these subgroups are the 2Fe-2S ferredoxins. which are pro- 
teins or domains of around one hundred amino acid residues that bind a single 2Fe-2S iron-sutf ur cluster. The proteins 
that are known [2J to belong to the famiry are listed below - Ferredoxin from photosynthetic organisms; namely plants 
and algae where it is located In the chloroplast or cyanelle; and cyanobacteria. - Fen'edoxin from archaebacteria of 

5 the Halobacterlum genus. - Fenedoxin IV (gene pftA) and V (gene fdxD) from Rhodobader capsulatus. - Ferredoxin 
in the toluene degradation operon (gene xylT) and naphthalene degradation operon (gene nahT) of Pseudomonas 
putida. - Hypothetical Escherichia coii protein yfaE. - The N-terminal domain of the bif unctional ferredoxin/Terredoxin 
reductase electron transfer component of the benzoate 1.2-dioxygenase complex (gene benC) from Acinetobacter 
calcoaceticus, the toluene 4-nw)nooxygenase complex (gene tmoF), the toluate 1 .2-dioxygenase system (gene xylZ), 

10 and the xylene monooxygenase system (gene xylA) from Pseudomonas. - The N-temiinal donrain of phenol hydrox- 
ylase protein p5 (gene dmpP) from Pseudomonas Putida. - The N-terminal domain of methane monooxygenase com- 
ponent C (gene nrvnoC) from Methylococcus capsulatus . - The C-terminal domain of the vanillate degradation pathway 
protein vanB in a Pseudomonas species. - The N4erminal domain of bacterial fumarate reductase iron-sulfur protein 
(gene frdB). - The N-temninal domain of CDP-6KJeoxy-3,4-glucoseen reductase (gene ascD) from Yersinia pseudotu- 

is berculosis. - The central domain of eukaryotic succinate dehydrogenase (ubiquinone) iron- sulfur protein. - The N- 
terminal domain of eukaryotic xanthine dehydrogenase. - The N-terminal domain of eukaryotic aldehyde oxidase. In 
the 2Fe-2S ferredoxins, four cysteine residues bind the iron-sulfur cluster. Three of these cysteines are clustered 
together in the same regton of the protein. Our signature pattern spans that iron-sulfur binding region. 
(0619] Consensus panem: C-{CHCHGA]-{C}-C-(GAST|-{CPDEKRHFYW}-C [The three C's are 2Fe-2S ligands]- 

20 1 1] Meyer J. Trends Ecol. Evol. 3:222-226(1 988).[ 2] Harayama S.. Polissi A., Rekik M. FEBS Lett. 285:85-88(1991). 
[0620] Adrenodoxin family, iron-sulfur binding region signature (fer2B) 

Fen-edoxins [1 J are a group of iron-sulfur proteans whch mediate electron transfer in a wide variety of metabolic reac- 
tkxis. Ferredoxins can be divided into several subgroups depending upon the physblogical nature of the iron sulfur 
cluster(s) and according to sequence similarities. One family of fenedoxins groups together the foltowing proteins that 

2S all bind a single 2Fe-2S iron-sulfur cluster - Adrenodoxin (ADX) (adrenal ferredoxin), a vertebrate mitochondrial protein 
which transfers electrons from adrenodoxin reductase to cytochrome P450scc, whrch is involved In cholesterol side 
chain cleavage. - Putklaredoxin (PTX). a Pseudomonas putida protein whfch transfers electrons from putidaredoxin 
reductase to cytochrome P450-cam, which is involved in the oxidation of camphor. - Terpredoxin [2], a Pseudomonas 
protein which transfers electrons from terpredoxin reductase to cytochrome 

30 P450-terp, which is involved in the oxidation of alpha-terpineol. - Rhodocoxin [3], a Rhodococcus protein wh ich transfers 
electrons from rfwxtocoxin reductase to cytochrome CYP116 (thcB), which is involved in the degradation of thiocar- 
bamate herbicides. - Escherichia coli ferredoxin (gene fdx) [4] whose exact f unctfon is not yet known. - Rhodobacter 
capsulatus ferredoxin VI [5], which may transfer electrons to a yet uncharaclerized oxygenase. - Caulobacter crescen- 
tus ferredoxin (gene fdxB) [6].ln these proteins, four cysteine resklues bind the iron^ulfur cluster. Three of these 

35 cysteines are clustered together in the same region of the protein. Our signature pattern spans that iron-sulfur binding 
regkxi. 

10621] Consensus pattern: C-x(2HSTAQhx-[STAMyi-C-[STA]-T-C-[HR] [The three C's are 2Fe.2S ligands]- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 
40 [2] Peterson J.A., Lu J.-Y, Geisselsoder J., Graham-Lorence S.. Carmona C, Witney R. Lorence M.C. J. BkA. 

Chem. 267:14193-14203(1992). 

[ 3] ^^Jagy I., Schoofs G., Compemolle R, Proost P, Vanderleyden J., De Mot R. J. Bacteriol. 177:676-687(1995). 
J 4] TaD.T, Vickery LE. J. BkjI. Chem. 267:11120-11125(1992). 

[ S\ Naud L, Vincon M., Garin J., Gaillard J.. Forest E., Jouanneau Y. Eur. J. Bkx:hem. 222:933-939(1994). 
45 1 6] Amemiya K EMBL/Genbank: X51607. 

[0622] 204. 4Fe-4S ferredoxins, iron-sulfur binding region signature (fer4) 

Ferredoxins [1] are a group of iron-sulfur proteins whfch mediate electron transfer In a wide variety of metabolk: reac- 
twns. Ferredoxins can be divkled into several subgroups depending upon the physiobgical nature of the iron-sulfur 

so cluster{s). One of these subgroups are the 4Fe-4S ferredoxins. which are found in bacteria and which are thus often 
referred as "bacterial-type* ferredoxins. The structure of these proteins [2] consists of the duplication of a domain of 
twenty six amino add residues; each of these domains contains four cysteine residues that bind to a 4Fe-4S center. 
A number of proteins have been found [3] that include one or more 4F&4SbirKiing domains similar to those of bacterial- 
type ferredoxins. These proteins are listed betow (references are only provided for recently determined sequences). - 

ss [0623] The iron-sulfur proteins of the succinate dehydrogenase and the f unnarate reductase complexes (EC 1.3.99.1 ). 
These enzyme complexes, which are components of the tricarboxylic acid cycle, each contain three subunrts: a flavo- 
protein, an iron-sulfur protein, and a b-type cytochrome. The iron- sulfur proteir»s contain three different iron-sulfur 
centers: a 2Fe-2S, a 3Fe-3S and a 4Fe-4S. - Escherichia coli anaerobic glycerol-3-phosphate dehydrogenase (EC 
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1.1.99.5) This enzyme is composed of three subunits: A, B, and C. The C subunit seems to be an iron-sulfur protein 
with two ferredoxin-{ike domains in the N-terminal part of the protein. - Escherichia coli anaerobic dimethyl sulfoxide 
reductase. The B subunit of this enzyme (gene dmsB) is an iron-sulfur protein with four 4Fe-4S ferredoxin-like domains. 
- Escherichia cdi fonmate hydrogenlyase. Two of the subunits of this oligomer^ complex (genes hycB and hycF) seem 

s to be iron-sulfur proteins that each contain two 4Fe-4S ferredoxin-like domains. - Methanobacterium formfcicum formate 
dehydrogenase (EC 1.2.1.2 ). This enzyme is used by the archaebacteria to grow on formate. The beta chain of this 
dimerk: enzyme probably binds two 4Fe-4S centers. - Escherichia coll fonmate dehydrogenases N and O (EC 1.2.1.2 ). 
The beta chain of these two enzymes (genes fdnH and fdoH) are iron-sulfur proteins with four 4Fe-4S ferredoxin-like 
domains. - Desulfovlbrio peripiasmic [Fe] hydrogenase (EC 1.18.99.1) . The large chain of this dimeric enzyme binds 

10 three 4Fe-4S centers, two of which are located in the ferredoxin-like N-terminal region of the protein. - Methanobac- 
terium thermoautrophk:um methyl vk>k>gen-reductng hydrogenase subunit mvhB, which contains six tandemty repeated 
ferredoxin-like domains and whk:h probably binds twelve 4Fe-4S centers. - SalnrKxiella typhimurlum anaerobic sulfite 
reductase (EC 1 .8.1 .-) (41. Two of the subunits of this enzyme (genes asrA and asrC) seem to both bind two 4Fe-4S 
centers. - A Fen-edoxin-like protein (gene fixX) from the nitrogen-fixation genes locus of varbus Rhizoblum species, 

IS and one from the Nrf-region of Azotobacter species. - The 9 Kd polypeptide of chtoroplast photosystem I [5] (gene 
psaC). This protein contains two bw potential 4Fe-4S centers, refen^ed as the A and B centers. - The chtoroplast fncB 
protein whk:h is predk^ed to carry two 4Fe-4S centers. - An ferredoxin from a primitive eukaryote, the enterte amoeba 
Entamobea histolytica. - Escherichia coll hypothetk:al protein yjjW, a protein with a N-terminal region belonging to the 
radical activating enzymes family (see <PDOC00834 >) and two potential 4Fe-4S centers.The pattern of cysteine res- 

20 idues in the iron-sulfur regkxi is sufficient todetect this class of 4Fe-4S binding proteins. 

[0624] Consensus pattern: C-x(2)-C-x(2)-C-x(3)-C-[PEG] [The four C's are 4Fe-4S ligands]- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 
[ 2] Otaka E., Ooi T. J. Mol. Evol. 26:257-267(1987). 
2S [ 3] Beinert H. FASEB J. 4:2483-2492(1990). 

[ 4] Huang C.J., Barrett E.L J. Bactertol. 173:1544-1553(1991). 
[ 5] Knaff D.B. Trends Bk)chem. Sci. 13:460-461 (1988). 

[0625] 205. mWffxO family signatures (fer4_NifH) 

30 Nrtrogenase (EC 1.18.6.1) [1] is the enzyme system responsible for biological nitrogen fixation. Nitrogenase is an 
oligomerk: complex whch consists of two components: component 1 whksh contains the active site for the reduction 
of nitrogen to ammonia and component 2 (also called the iron protein).Component 2 is a homodimer of a protein (gene 
nif H) whk;h binds a single 4Fe-AS iron sulfur cluster [2]. In the nitrogen fixation process nifH is first reduced by a protein 
such as ferredoxin; the reduced protein then transfers electrons to component 1 with the concomitant consumption of 

35 ATP. A number of proteins are known to be evolutionary related to nifH. These proteins are: - Chbroplast encoded frxC 
(or chIL) protein [3]. frxC is encoded on the chloroplast genome of some plant species, its exact f unctk)n is not known, 
but it cou W act as an electron carrier in the converskxi off protochtorophy Hide to chbrophyllide. - F^odobacter capsulatus 
proteins bchL and bchX [4). These proteins are also likely to play a role in chtorophyll synthesis. There are a number 
off consented regkxis in the sequence of these proteins: in the N-terminal sectfon there is an ATP-binding site motiff *A' 

40 (P-loop) and in the central secton there are two consented cysteines whch have been shown, in nifH, to be the ligands 
of the 4Fe-4S cluster. Two signatures patterns that correspond to the regions around these cysteines were developed. 
[06201 Consensus pattem: E-x-G-G-P-x(2)-[GA]-x-G-C-[AG)-G (C binds the iron-sulfur center]- 
Consensus pattem: D-x-L-G-l>V-V-C-G-G-F-(AGJ-x-P [C binds the iron-sulfur center]- 

45 [ 1J Pau R.N. Trencte Bkx:hem. Sci. 14:183-186(1989). 

[ 2] Georgiadis M.M., Komiya H., Chakrabarti P. Woo D., Kornuc J.J., Rees D.C. Science 257:1653-1659(1992). 
[ 3] Fujita Y, Takahashi Y, Kohchi T, Ozeki H., Ohyama K., Matsubara H. Plant Mol. Bk)l. 13:551-561(1989). 
[ 4] Burke D.H., Alberti M., Hearst J.E. J. Bactertol. 175:2407-2413(1993). 

so [0627] 206. Ferritin iron-binding regions signatures 

Ferritin [1,2] is one of the major non-heme iron storage proteins. It consists of a mineral core of hydrated ferrk: oxide, 
and a multi-subunil protein shell which engtobes the former and assures its solubility in an aqueous environment. In 
animals the protein is mainly cytoplasmks and there are generally two or more genes that encodes for cbsely related 
subunits (in mammals there are two subunits whfch are known as H(eavy) and L(ight)). In plants ferritin is found in the 

ss chtoroplast [SJ.There are a number of well conserved region in the sequence of ferritins. Two of these regions to develop 
signature pattems were selected. The first pattem is kx:ated in the central part of the sequence of ferritin and it contains 
three consented glutamate whk:h are thought to be involved in the binding of iron. The second pattem is tocated in the 
C4erminal section, it corresponds to a regton whrch forms a hydrophilic channel through whch small molecules and 
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bns can gain access to the central cavity of ttie nnolecule; this pattern also includes consented acidic residues which 
are potential metal-binding sites. 

[0620] Consensus pattern: E-x4KR]-E-x(2)-E-[KRHLFHLIVMA]-x(2)-Q-N-x-R-x-G-R [The 3 E's are potential iron 

ligands]- 

Consensus pattern: D-x(2)^UVMFHSTACHDH^F^UHEN]-x(2HFYI-L-x(6HLI^^-[K^I] [The second D and the E are 
potential Iron ligands]- 

[ 1] Crichton RR., Charloteaux-VV!auters M. Eur. J. Biochem. 164:485-506(1987). 
[ 2]Theil E.G. Annu. Rev. Biochem. 56:289-315(1987). 

[ 3] Ragland M.» Briat J.-F.. Gagnon J.. Laulhere J.-R. Massenet O.. Thell E.G. J. Biol. Chem. 265:18339-18344 
(1990). 

[0629] 207. Intermediate filaments signature (filament) 

Intermediate filaments (IF) [1,2,3] are proteins which are primordial components of the cytoskeleton and the nuclear 
envelope. They generally form filamentous structures 8 to 1 4 nm wide. IF proteins are memk)ers of a very large multlgene 
family of proteins which has been subdivided in five major subgroups: - Type I: Acidic cytokeratlns. - Type II: Basic 
cytokeratins. - Type III: Vimentin, desmin, glial fibrillary ackJic protein (GFAP), peripherin. and plasticin. - Type IV. 
Neurofilaments L. H and M, alpha-intemexin and nestin. - Type V: Nuclear lamins A, B1 , B2 and C. All IF proteins are 
structurally similar in that they consist of: a central rod domain comprising some 300 to 350 residues which is an-anged 
In coiled-coiled alpha-helices, with at least two short characteristic interruptbns; a N4enminal non-helical domain (head) 
of variable length; and a C-terminal domain (tail) which is also non-helical, and which shows extreme length variatfon 
between different IF proteins. While IF proteins are evolutk>nary and stmcturally related, they have limited sequence 
homotogies except in several regions of the rod domain. A consented regkxi at the G4ermlnal extremity of the rod 
domain was used as a sequence pattern for this class of proteins. 
[0630] Consensus pattern: [IVl-x-[TACI]-Y-[RKH]-x-[LMK-[DE]- 

[ 1] Quinlan R.. Hutchison C, Lane B. Protein Prof. 2:801-952(1995). 
[ 2] Stelner RM.. Roop D.R. Annu. Rev. Biochem. 57:593-625(1988). 
[ 3] Stewart M. Gurr. Opin. Cell Biol. 2:91-100(1990). 

[0631] 208. Flavodoxin signature 

Flavodoxins [1 ,E1] are electron4ransfer proteins that functkMi in various electron transport systems. Flavodoxins bind 
one FMN molecule, whk:h serves as a redox-active prosthetic group. Flavodoxins are f unctbnally interchangeable with 
ferredoxins. They have been isolated from prokaryotes, cyanobacteria, and some eukaryotic algae. The signature 
pattern for these proteins is derived from a conserved regfon in their N-terminal section, this regkxi is involved in the 
binding of the FMN phosphate group. 

[0632] Consensus pattern: [Liy]-[LIVFY)-[FY]-x-[ST]-x(2)-(AGC]-x-T-x(3)-A-x(2)-[Liy|- 
[ 1] Wakabayashi S., Kimura K., A4atsubara H., Rogers L.J. Biochem. J. 263:981-984(1989). 
[0633] 209. Growth factor and cytokines receptors family signatures (fn3) 

A number of receptors for lymphokines. hematopoeltic growth factors and growth hormone-related molecules have 
been found [1 to 5] to share a common binding domain. Receptors known to betong to this family are: - Cytokine 
receptor common beta chain. This chain is common to the IL-3. IL-5 and GM-CSF receptors. - Cytokine receptor 
common gamma chain. This chain is common to the IL-2. IL-4. IL-7 and IL-13 receptors. - Ciliary neurotrophic factor 
receptor (CNTFR). - Erythropoietin receptor (EPOR). - Granukxjyle cotony-slimulating factor receptor (G-CSFR). - 
GranukKyte-macrophage cokmy-stlmulating factor receptor alpha chain (GM- CSFR). - lnterleukln-2 receptor beta 
chain (IL2R-beta). - lnterteukln-3 receptor alpha chain (IL3R). - lnterleukin-4 receptor alpha chain (IL4R). - Interleukin- 
5 receptor alpha chain (IL5R). - lnterleukin-6 receptor (IL6R). - lnterleukin-7 receptor alpha chain (IL7R). - Interleukin- 
9 receptor (IL9R). - Growth hormone receptor (GRHR). - Prolactin receptor (PRLR). - Thrombopoeitin receptor (TPOR). 
The conserved regkxi constitutes all or part of the extracellular ligand-binding region and is about 200 amino acid 
residues long. In the N-temiinal of this domain there are two pairs of cysteines known, in the growth hormone receptor, 
to be involved In disulfide bonds, h -XXXXXXX k I C C C C Extracel- 
lular XXXXXXX Cytoplasmic I+-H I-I XXXXXXX m Transmembrane +- 

+ +-+ Two patterns to detect this family of receptors were used. The first one is derived from the first N-terminal disulfide 
kx)p, the second Is a tryptophan-rich pattern kx:ated at the C-terminal extremity of the extracellular regk>n. 
[0634] Consensus pattern: C-[LVFYR]-x(7,8)-[STI VDNJ-C-x-W (The two C's are linked by a dtsuffide bond]- 
Consensus pattem: [STGL]-x-W-(SG]-x-W-S- 

[ 1] Bazan J.R Biochem. B«phys. Res. Commun. 164:788-795(1989). 
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[ 2J Bazan J.F. Proc. Natl. Acad. Sci. U.S. A 87:6934-6938(1990). 

[ 3] Cosman D.. Lyman S.D., Idzerda R.L. Beckmann M.R, Park LS., Gkxxlwin R.G., March C.J, Trends Biochem. 
Sci. 15:265-270(1990). 

[4] rfAndrea A.D.. Fasman G.D., Lodish H.F. Cell 58:1023-1024(1989). 
s 1 5] tfAndrea A.D.. Fasnnan 6.D., Lodish H.F Curr. Opin. Cell Biol. 2:648-651 (1990). 

[0635] 210. Phosphorlbosylglyclnamide fomnyltransferase active site (formyLtransf) 

Phosphoribosylglycinamide formyllransferase (EC 2.1.2.2 ) (GAFTT) [1] catalyzes the third step in de novo purine bio- 
synthesis, the transfer of a fonmyl group to 5'-phosphoribosylgtycrnamide. In higher eukaryotes, GART is part of a 
10 multifunctional enzyme polypeptide that catalyzes three of the steps of purine biosynthesis. In bacteria, plants and 
yeast, GART is a monof unctional protein of about 200 amino-acid residues. In the Escherichia coll enzyme, an aspartic 
acid residue has been shown to be involved in the catalytic mechanism. The region around this active site residue is 
well conserved in GART from prokaryotic and eukaryotic sources and can be used as a signature pattern. Mammalian 
formyltetrahydrofolate dehydrogenase (EC 1.5.1.6 ) [2] is a cytosolicenzyme responsible for the NADP-dependent de- 
ls cartxjxylative reduction of 10-fomnyltetrahydrofolate into tetrahydrofolate. It is a protein of about 900 amino acids con- 
sisting of three domains; the N-termlnal domain (200 residues) is structurally related to GARTs. Escherichia coil me- 
thionyHRNA f onmyltransferase (EC 2.1.2.9 ) (gene fmt) (3]is the enzyme responsible for modifying the free amino group 
of the aminoacylmolety of methionyl-A(f Met). The central part of fmt seems to be evolutionary related to GARTs active 
site region. 

20 [0636] Consensus pattern: G-x-[STMJ4IVT]-x-[FYVm3H\^T]-x-(DEVM]-x-[UVMY]-D-x-G- x(2)-[UVr|-x(6)- 
[LIVM] [D is the active site residue] - 

[ 1] Inglese J., Smith J.M., Benkovic S.J. Bkx:hemlstry 29:6678-6687(1990). 
[ 2] Cook RJ., Uoyd R.S., Wagner C. J. Biol. Chem. 266:4965-4973(1991). 
2S [ 3] Guflkxi J.-M.. Mechulam Y. Schmitter J.-M., Blanquet S., Fayat G. J. Bacleriol. 174:4294-4301(1992). 

[0637] 21 1 . G10 protein signatures 

A Xenopus protein krK>wn as G10 [1] has been found to be highly conserved in a wide range of eukaryotic species. 
The funclksn of G10 is still unknown. G10 is a protein of about 17 to 18 Kd (143 to 157 residues) which Is hydrophilic 
30 and whose C-terminal half is rich in cysteines and couW be involved in metal-binding. As signature patterns, two of 
these cysteine-rk^ segments were selected. 

[Oeaq consensus pattern: L-C-C-x-[KRpC-x(4)-[DE]-x-N-x(4)-C-x-C-R-V-P- 
Consensus pattem: C-x-H-C-G-C-[KRH]-G-C-[SA]- 

[0639] [ 1] McGrew LL, Dworkin-RastI E., Dwortcin M.B., Richter J.D. Genes Dev 3:803-81 5(1 989). 
3S [0640] 212. G-protein alpha subunit 

[0641] G proteins couple receptors of extracellular signals to intracellular signaling pathways. The G protein alpha 
subunit binds guanyl nucleotkte and is a weak GTPase. Number of members: 1 95 

[1] Coleman DE, Berghuis AM. Lee E, Under ME, Gilman AG, Sprang SR. Science 1994;265:1405-1412. 
40 [2] How G proteins work: a continuing story. Coleman DE, Sprang SR, Trends Bkx:hem Sci 1 996;21 :41 -44. 

[0642] 21 3. Glucose-6-phosphate dehydrogenase active site (G6PD) 

Glucose-e^jhosphate dehydrogenase (EC 1.1.1.49) (G6PD) [1] catalyzes the first step in the pentose pathway, the 
reductkjn of glucose-Sphosphate to gluconolactone 6-phosphate. A lysine resklue has been klentified as are active 
4S nucleophile associated with the activity of the enzyme. The sequence around this lysine is totally conserved from 
bacterial to mammalian G6PD*s and can be used as a signature pattem 
[0643] Consensus pattem: D-H-Y-L-G-K-{EQK] [K is the active site residue]- 

[0644] 1 1] Jeffery J.. Persson B.. Wbod I., Bergman T, Jeffery R.. Joemvall H. Eur. J. Biochem. 212:41-49(1993). 
[0645] 21 4. G ATA-type zinc finger domain 
so The GATA family of transcription factors are proteins that bind to ON A sites with the consensus sequence (A/T)G ATA 
(A/G), found within the regulatory region of a number of genes. Proteins currently known to betong to this family are: 

- GATA-1 [1] (also known as Eryf 1 , GF-1 or NF-E1 ), which binds to the GATA region of globin genes and other genes 
expressed in erythrokJ cells. It Is a transcriptional activator which probably serves as a general 'switch' factor for erylh- 
roiddevetopment. - GATA'2 [2], a transcriptional activator which regulates endothelin-1 gene expresston in endothelial 

ss cells. - GATA-3 [3], a transcriptk)nal activator whk:h binds to the enhancer of the T-cell receptor alpha and delta genes. 

- GATA-4 [4], a transcriptkxial activator expressed in endodermally derived tissues and heart. - Drosophlla protein 
pannier (or DGATAa) (gene pnr) whfch acts as a repressor of the achaete-scute complex (as-c). - Bombyx mori BCFI 
[5], whk:h regulates the expression of chorbn genes. - Caenorhabdltis elegans elt-1 and elt-2. transcriptbnal activators 
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of genes containing the GATA region, including vitellogenin genes [6]. - Ustilago nnaydis urbsl [7], a protein involved 
In the repression of the biosynthesis of siderophores. - Fission yeast protein G AF2. All these transcription factors contain 
a pair of highly similar 'zinc finger* type domains with the consensus sequence C-x2-C-x1 7-C-x2-C. Sonne other proteins 
contain a single zinc finger motif highly related to those of the GATA transcription factors. These proteins are: - Dfo- 

5 sophila box A-binding factor (ABF) (also known as protein serpent (gene srp)) which may function as a transcriptional 
activator protein and may play a key role in the organogenesis of the fat body - Emericella nidulans areA [8], a tran< 
scriptional activator whk^h mediates nitrogen metabolite repressbn. - Neurospora crassa nrt-2 [9], a transcriptional 
activator which tums on the expression of genes coding tor enzymes required for the use of a variety of secondary 
nitrogen sources, during conditkxis of nitrogen limitation. - Neurospora crassa white collar proteins 1 and 2 (WG-1 and 

10 WC-2), whk:h control expression of light-regulated genes. - Saccharomyces cerevisiae DAL81 (or UGA43), a negative 
nitrogen regulatory protein. - Saccharomyces cerevisiae GLN3, a positive nitrogen regulatory protein. - Saccharomyces 
cerevisiae GAT1 . - Saccharomyces cerevisiae G2F3. 

[0646] Consensus pattern: C-x4DNhC-x(4,5HST>x(2)-W4HRHRK]-x(3)4GN]-x(3.4)- G-N4AS]-C [The four C's are 

zinc llgands] 

75 

[ 1] Trainor G.D.. Evans T, Felsenfeki G., Boguski M.S, Nature 343:92-96(1990). 
[ 2] Lee M.E., Temizer D.T., Clifford J.A., Quertermous T J. Biol. Chem. 266:16188-16192(1991). 
[ 3] Ho l.-C., Vorhees R, Marin N., Oakley B.K., Tsai S.-F., Ortcin S.H., Leiden J.M. EMBO J. 10:1187-1192(1991). 
[ 4] Spieth J.. Shim Y.H.. Lea K.. Conrad R, Blumenthal T Mol. Cell. Biol. 11:4651-4659(1991). 
20 1 5] Drevet J.R.. Skeiky YA., latrou K. J, Bk)l. Chem. 269:10660-10667(1994). 

[ 6] Hawkins M.G., McGhee J.D, J. Btol. Chem. 270: 14666-1 4671 (1995V 
[ 7] Vbisard C.P.O.. Wang J., Xu P., Leong S.A., McEvoy J.L MoL Cell. Bk>l. 13:7091-7100(1993). 
[ 8] Arst H.N. Jr., Kudta B.. Martinez-Rossi N.M., Caddk:k M.X., Sibley S., Davies R.W. Trends Genet 5:291-291 
(1989). 

2S [ 9] Fu Y-H.. Marzluf G.A. Mol. Cell. Bbl. 10:1056-1065(1990). 

[0647] 215. Glutamine amkiotransferases class-l active site (GATase) 

A large group of biosynthetk: enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known asglutamlne 

30 amidotransf erase (GATase) (EC 2.4.2.-) [1]. The GATase donnain exists either as a separate polypeptldic subunit or 
as part of a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase donnains have been identified [2,3]: class-l(also known as trpG-type) and class-ll (also known as 
purF-type). Class-l GATase domains have been found in the following enzymes: - The second component of anthra- 
nilate synthase (AS) (EC 4.1.3.27 ) [4]. AS catalyzes the bbsynthesis of anthranilate from chorismate and glutamine. 

35 AS is generally a dimeric enzyme: the first component can synthesize anthranilate using ammonia rather than 
glutamine, whereas component II provkies the GATase activity. In some bacteria and in fungi the GATase component 
of AS is part of a multifunctional protein that also catalyzes other steps of the bx>synthesis of tryptophan. - The second 
component of 4-amino-4-deoxychorismale (ADC) synthase (EC 4.1 .3. -), a dimeric prokaryotk: enzyme that function 
in the pathway that catalyzes the bbsynthesis of para-aminobenzoate (PABA) from chorismate and glutamine. The 

40 second component (gene pabA) provides the GATase activity [4]. - CTP synthase (EC 6. 3.4.2 ). CTP synthase catalyzes 
the final reaction in the bk>synthesis of pyrimidine. the ATP-dependent formatkxi of CTP from UTP and glutamine. CTP 
synthase is a single chain enzyme that contains two distinct domains; the GATase domain is In the C-tenminal sectkxi 
[2]. - GMP synthase (glutamine«-hydrolyzing) (EC 6.3.5.2) . GMP synthase catalyzes the ATP-dependent formatfon of 
GMP from xanthosine 5*-phosphate and glutamine. GMP synthase is a single chain enzyme that contains two distinct 

45 domains; the GATase domain is in the N-tenminal sectkxi [5]. - Glutamlne-dependent cart>anrK>yl-phosphate synthase 
(EC 6.3.5.5 ) (GD-CPSase); an enzyme involved in both arginine and pyrimidine bwsynthesis and which catalyzes the 
ATP-dependent formation of carbamoyl phosphate from glutamine and carbon dk>xkie. In bacteria GD-CPSase is com- 
posed of two subunits: the large chain (gene carB) provides the CPSase activity, while the small chain (gene carA) 
provides the GATase activity. In yeast the enzynne Involved in arginine biosynthesis is also composed of two subunits: 

so CPA1 (GATase), and CPA2 (CPSase). In nnost eukaryotes, the first three steps of pyrimkJine biosynthesis are catalyzed 
by a large multifunct»nal enzyme (called URA2 In yeast, rudimentary in Drosophila, and CAD in mammals). The GA- 
Tase domain is kxated at the N-terminal extremity of this polyprotein [6]. - 

Phosphoribosylfonmylgiycinamkline synthase II (EC 6.3.5.3) . an enzyme that catalyzes the fourth step in the de novo 
biosynthesis of purines. In some species of bacteria, FGAM synthase II is composed of two subunits: a small chain 
ss (gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the aminator activity. - 
The histkJine amkJotransferase hisH, an enzyme that catalyzes the fifth step in the biosynthesis of histidine in prokary- 
otes.ln the second component of AS a cysteine has been shown [7] to be essentialfor the amkk^ransferase activity 
The sequence around this residue is well conserved in all the above GATase domains and can be used as a signature 
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pattern for class-l GATase.- 

[0648] Consensus pattern: (PASHLI>^FYTHUVMFY}-G4U VMFY]-C4UVMFYN]-G-x-{QEH]- x4UVMFA] [C is the 
active site residue]- 

s 1 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M.. Zalkin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] Crawford I.P. Annu. Rev. Microbiol. 43:567-600(1989). 

[ 5] Zalkin H., Argos R. Narayana S.V.L., Tiedeman A.A, Smith J.M. J. Biol. Chem. 260:3350-3354(1985). 
10 1 6] Davidson J.N., Chen K.C., Jamison R.S., Musmanno LA., Kem C.B. BioEssays 15:157-164(1993). 

[ 7J Tso J.Y.. Hermodson M.A., Zalkin H. J. Bk)l. Chem. 255:1451-1457(1980). 



[0649] 216. Glutamine amidotransferases class-ll active site (GATase_2) 

A large group of bk>synthetk: enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
T5 to transfer this group to a substrate to fonm a new cartx)n-nrtrogen group. This catatytk: activity is known as glutamine 
arnktotransferase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptkiic subunit or 
as part d a larger polypeptide fused in different ways to a synthase domain. On the basis of sequence similarities two 
classes of GATase domains have been identified [2,3]: class-l (also known as trpG-type) and class-ll (also known as 
purF-type). Class-ll GATase domains have been found in the following enzymes: - Amido phosphoribosyltransferase 
20 (glutamine phosphoribosylpyrophosphate amldotransferase) (EC 2.4.2.14) . An enzyme whch catalyzes the first step 
in purine biosynthesis, the transfer of the ammonia group of glutamine to PRPP to fonm 5-phosphoribosylamine (gene 
purF In bacteria, ADE4 in yeast). - Glucosamine-f ructose^hosphate aminotransferase (EC 2.6.1.16 ). This enzyme 
catalyzes a key reactkxi In amino sugar synthesis, the fonmatkxi of glucosamine 6-phosphate from fmctose 6-phos- 
phate and glutamine (gene gImS in Escherichia coli, nodM in Rhizobium, GFA1 in yeast) - Asparaglne synthetase 
2S (glutamine-hydrolyzlng) (EC 6.3.5.41 . This enzyme is responsible for the synthesis of asparagine from aspartate and 
glutamine. A cysteine is present at the N-terminal extremity of the mature form of all these enzymes. The cysteine has 
been shown, in amido phosphoribosyltransferase [4] arui In asparagine synthetase [5] to be important for the catalytfc 
mechanism. 

[0650] Consensus pattern: <x(0, 11 )-C-[GS]-(IV]-[LIVMFYW)-[AG] [C is the active site residue]- 

30 

1 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 
[ 2] Weng M.. Zalkin H. J. Bacteriol. 169:3023-3028(1987). 
[ 3] r^unoya H., Lusty C.J. J. Biol. Chem. 259:9790-9796(1984) 
[ 4] van Heeke G.. Schuster M. J. BkM. Chem. 264:5503-5509(1989). 
35 [ 5] Vbllmer S.J., Switzer R.L. Hemnocteon M.A.. Bower S.G., Zalkin H. J. Bfol. Chem. 258:10582-10585(1983). 

[0651] 217, GDP dissociatbn inhibitor (GDI) 

[0652] [1 ] Schalk I, Zeng K, WU SK, Stura EA. Matteson J, Huang M, Tandon A, Wilson I A. Bak:h WE, Nature 1 996; 
381:42-48. 

40 [0653] 218. Qxidoreductase family (GFOJDH_MocA) 

[0654] This family of enzymes utilise NADP or NAD. This family: is called the GFO/IDH/MOCA family in swiss-prot. 
[0655] [1] Kingston RL, Scopes RK. Baker EN. Structure 1996;4:1413-1428. 
[065q 21 9. GHMP kinases putative ATP-binding domain 

The folbwing kinases contains. In their N-termlnal section, a consen/ed Gly/Ser-rich regfon whkrfi Is probably involved 
45 In the binding of ATP [1]. These kinases are listed below. - Galactokinase (EC 2.7.1.6 V - Homoserine kinase (EC 
2-7.1.39) . - Mevakxiate kinase (EC 2.7,1.36 ). - Phosphomevakxiate kinase (EC 2.7.4.2 ). This group of kinases was 
called 'GHMP (from the first lener of their substrate) 

Consensus pattem: IUVM]-IPKl-x-[GSTAJ-x(0,1)-G-L-[GS]-S-S-[GSA]-[GSTAC]- 
[0657] [ 1] Tsay Y.H., Robinson G.W. Mol. Cell. Bral. 11:620-631(1991). 

so [0658] 220. Glucose inhibited diviskx^ protein A family signatures (Gl DA) 

Bacterial glucose inhibited division protein A (gene gidA) is a protein of 70Kd whose function is not yet known and 
whose sequence is highly consen/ed. It is evolutkxiary related to yeast hypothetical protein YGL236C. Caenorhabditis 
elegans hypothetnal protein F52H3.2 and a Bacillus subtilis protein called gkJ (and whch is different from B.subtilis 
gidA). Two highly consent regkxis were selected as signature patterns. Both regions are located In the central region 

ss of the protein. 

[0659] Consensus pattem: [GS]-[PT]-x-Y-C-P-S-[LIVM]-E-x-K-[LIVM]-x-[KR]- 

Consensus pattern: A-G-Q-x-{NT|-G-x(2)-G-Y-x-E-[SAG](3)-[QS]-G-[LIVM](2)-A-G-iUVMT]-N-A- 

[0660] 221.(GLFVdehydrog) 
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Glu / Leu / Phe / \^) dehydrogenases active site 

- Glutamate dehydrogenases (EC 1 A 1 .2, EC 1 .4. 1 .3, and EC 1 .4. 1 .4) (GluDH) are enzymes that catalyze the NAD- 
or NADP-dependent reversible deamination of glutamate into alpha-ketoglutarate [1 ,2]. GluDH isozymes are gen- 

5 erally involved with either ammonia assimilation or glutamate catabolfem. 

- Leucine dehydrogenase (EC 1 .4.1 .9) (LeuDH) is a NAD-dependent enzyme that catalyzes the reversible deami- 
nation of leucine and several other aliphatic amino acids to their keto analogues [3]. 

- Phenylalanine dehydrogenase (EC 1 .4.1 .20) (PheDH) is a NAE>dependent enzyme that catalyzes the reversible 
deamidation of L-phenylalanine into phenylpyruvate [4]. 

10 - Valine dehydrogenase (EC 1 .4, 1 .8) (V^IDH) is a NADP-dependent enzyme that catalyzes the reversible deami- 
dation of L-valine into 3-methyl-2-oxobutanoate [5]. 

[0661] These dehydrogenases are structurally and functionally related. A conserved lysine residue located in a gly- 
cine-rich region has been implicated in the catalytic mechanism. The conservation of the region around this residue 
IS allows the derivation of a signature pattern for such type of enzymes. 

[0662] Consensus pattem[LIV]-x(2)-G-G-ISAG]-K-x-IGVl-x(3)-[DNST]-[PL] [K is the active site residue] Sequences 
known to beksng to this dass detected by the pattem ALL 

[0663] Note all known sequences from this temily have Pro in the last positkm of the pattem with the exception of 
yeast GluDH which as Leu. 

20 

1 1] Britton K.L., Baker PJ., Rice D.W., Stillnrian T.J. Eur. J. Biochem. 209:851-859(1992). 
[ 2] Benachenhou-Lahfa N.. Forterre P. Labedan B. J. Mol. EvoL 36:335-346(1993). 

[ 3] IMagata S., Tanizawa K.. EsakI N., Sakamoto Y, Ohshima T. Tanaka H.. Soda K. Bkx:hemistry 27:9056-9062 
(1988). 

2S [ 4] Takada H.. Yoshimura T, Ohshima T, Esaki N., Soda K. J. Biochem. 109:371-376(1991). 

( 5] Hutchinson C.R., Tang L J. Bacteriol. 175:4176-4185(1993). 

[0664] 222. GMC oxidoreductases signatures 

The foltowing FAD flavoproteins oxkioreductases have been found [1,2] to be evolutionary related. These enzymes, 

30 which are called *GMC oxkioreductases*, are listed betow. - Glucose oxidase (EC 1.1.3.4) (GOX) from Aspergillus niger. 
Reaction catalyzed: glucose + oxygen -> defta-gluconolactone + hydrogen peroxkJe. - Methanol oxidase (EC 1.1.3.13) 
(MOX) from fungi. Reactkm catalyzed: methanol + oxygen -> acetaldehyde + hydrogen peroxkJe. - Choline dehydro- 
genase (EC 1.1.99.1 ) (CHD) from bacteria. Reaction catalyzed: choline + unknown acceptor -> betaine acetaldehyde 
+ reduced acceptor. - Glucose dehydrogenase (GLD) (EC 1.1.99.10 ) from Drosophila. Reactkxi catalyzed: glucose + 

3S unknown acceptor -> delta-gluconolactone + reduced acceptor. - Cholesterol oxkiase (CHOD) (EC 1.1 .3.6 ) from Brevi- 
bacterium sterolrcum and Streptomyces strain SA-COO. Reactbn catalyzed: cholesterol + oxygen -> cholest-4-en- 
3-one + hydrogen peroxkie. - AlkJ [3], an alcohol dehydrogenase from Pseudomonas oleovorans. whkrfi converts 
aliphatk: medium-chain-lenglh ateohols into aWehydes. This family also includes a lyase: - (R)-mandelonitrile lyase 
(EC 4.1 .2.10) (hydroxynitrile lyase) from plants [4], an enzyme involved in cyarK)genis, the release of hydrogen cyankle 

40 from injured tissues. These enzymes are proteins of size ranging from 556 (CHD) to 664 (MOX) amino ackJ reskJues 
which share a numk>er of regioris of sequence similarities. One of these regk>ns. located in the N-terminal section, 
corresponds to the FAD ADP-binding domain. The f unctkxi of the other conserved domains is not yet known; two of 
these domains were selected as signature patterns. The first one is kx:ated in the N^erminal section of these enzymes, 
about 50 reskiues after the ADP-binding domain, while the second one is located in the central sectkxi. 

4S [0665] Consensus pattem: [GA]-{RKNhx-IUV]^(2)-[GST|(2)-x-[LIVMl-N-x(3)-lFYWA]- x(2)-[PAGl-x(5)-IDNESHh 
Consensus pattem: [GS]4PSTAhx(2)-[ST]-P-x-{LIVM](2)-x(2)-S-G-{LIVM]-G- 

{ 1] Cavener D.R. J. Mol. Bk>l. 223:811-814(1992). 
[ 2] Henikoff S., Henikoff J.6. Genomk:s 19:97-107(1994). 
so I 3] van Beilen J.B., Eggink G., Enequist H.. Bos R., Witholt B. Mol. Microbtol. 6:3121-3136(1992). 

[ 4] Cheng I P, Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

[0666] 223. {GMP_synt_C) 
Glutamine amktotransferases class-l active site 
ss [0667] A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine 
and then to transfer this group to a substrate to fomn a new carbon-nitrogen group. This catalytic activity is kriown as 
glutamine amkJotransferase (GATase) (EC 2.4.2.-) [1]. The GATaso domain exists either as a separate polypeptidk: 
subunit or as part of a larger polypeptkle fused in different ways to a synthase domain. On the basis of sequence 



107 



EP 1033 405 A2 



simitarrttes two classes of GATase domains have been identified [2.3]; class-l (also known as trpG-type) and class-ll 
(also known as purF-type). Glass-I GATase domains have been found In the foltowing enzymes: 

- The second component of anthraniiate synthase (AS) (EC 4. 1 .3.27) [4). AS catalyzes the biosynthesis of anthra- 
nllate from chorismate and glutamine. AS is generally a dimeric enzyme: the first component can synthesize an- 
thraniiate using ammonia rather than glutamine, whereas component It provkJes the GATase activity. In some 
bacteria and in fungi the GATase component of AS is part of a multifunctbnal protein that also catalyzes other 
steps of the biosynthesis of tryptophan. 

- The second component of 4-amino>4-deoxychorismate (ADC) synthase (EC 4. 1 .3. -), a dimeric prokaryotk: enzyme 
that f uncton in the pathway that catalyzes the biosynthesis of para-amrnobenzoate (PABA) from chorismate and 
glutamine. The second component (gene pabA) provides the GATase activity [4]. 

- CTP synthase (EC 6.3.4.2). CTP synthase catalyzes the final reactton in the biosynthesis of pyrlmWine, the ATP- 
dependent fonmatkxi of CTP from UTP and glutamine. CTP synthase is a single chain enzyme that contains two 
distinct domains; the GATase domain is in the C-lerminal section [2]. 

- GMP synthase (glutamine-hydrolyzing) (EC 6.3.5.2). GMP synthase catalyzes the ATP-dependent formatton of 
GMP from xanthoslne 5'-phosphate and glutamine. GMP synthase is a single chain enzyme that contains two 
distinct domains; the GATase doma'nn is in the N4erminal sectk)n [5]. 

- Glutamine-dependent carbamoyl-phosphate synthase (EC 6.3.5.5) (GD-CPSase); an enzyme Involved In both 
arginine and pyrlmidine b»synthesis and whch catalyzes the ATP-dependent formation of carbamoyl phosphate 
from glutamine and carbon dk)xide. In bacteria GD-CPSase is composed of two subunlts: the large chain (gene 
carB) provides the CPSase activity, while the small chain (gene carA) provkies the GATase activity. In yeast the 
enzyme involved in arginine bbsynthesis is also composed of two subunits: CPA1 (GATase), and CPA2 (CPSase), 
In most eukaryotes, the first three steps of pyrimidine bbsynthesis are catalyzed by a large multifunctional enzyme 
(called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals). The GATase domain is located at the 
N-terminal extremity of this polyprotein [6]. 

- Phosphoribosylformylglycinamidine synthase II (EC 6.3.5.3), an enzyme that catalyzes the fourth step in the de 
novo biosynthesis of purines. In some species of bacteria, FGAM synthase II is composed of two subunits: a small 
chain (gene purQ) whfch provktes the GATase activity and a large chain (gene purL) whk:h provWes the aminator 
activity. 

- Ihe histkline amidotransferase hisH, an enzyme that catalyzes the fifth step in the biosynthesis of histkJIne in 
prokaryotes. 

[066q In the second component of AS a cysteine has been shown [7] to be essential for the amktotransferase activity 
The sequence around this resklue Is well consented in all the above GATase domains and can be used as a signature 
pattern for class-l GATase. 

[0669] Consensus pattemIPAS]4LIVMFYTHLIVMFY]-G-[LIVMFY]-C-[LIVMFYNhG-x-[QEH]- x-[LIVMFA] [C is the 
active site resklue] Sequences known to bekmg to this class detected by the pattern ALL, except for 6 sequences. 
[0670] Note: in the first po6itk>n of the pattern Pro is found in all cases except in the slime nrK>ki GD<:PSase where 
it is replaced by Ala. 

[ 1] Buchanan J.M. Adv. Enzynriol. 39:91-183(1973). 

[ 2] Weng M., Zaikin H. J. Bacteriol. 169:3023^3028(1987). 

[ 3] Nyunoya H.. Lusty CJ. J. Bkd. Chem. 259:9790-9798(1984). 

[4] Crawford LP Annu. Rev. h4k»obk>l. 43:567-600(1989). 

[ 5] Zaikin H., Argos P, Narayana S.V.L, Tiedeman A.A., Smith J.M. J. Bbl. Chem. 260:3350-3354(1985). 
[ 6] DavkJson J.N., Chen K.C., Jamison R.S., Musmanno LA., Kern C.B. BbEssays 15:157-164(1993). 
[ 7] Tso J.Y., Hermodson M.A., Zaikin H. J. Bbl. Chem. 255:1451-1457(1980). 

[0671] 224. Glutathkxie peroxklases signatures (GSHPx) 

Glutathbne.peroxbase (EC 1.11.1.9) (GSHPx) [1,2] is an enzyme that catalyzes the reduction of hydroxyperoxides 
by glutathbne. Its main function is to protect against the damaging effect of endogenously formed hydroxyperoxbes. 
In higher vertebrates at least four forms of GSHPx are known to exist: a ubk^uitous cytosolb form (GSHPx-1 ). a gas- 
trointestinal cytosolb for (GSHPx-GI) [3], a plasma secreted form (GSHPx-P) [4], and a epbidyrnal secretory form 
(GSHPx-EP). In additbn to these characterized forms, the sequence of a protein of unknown f unctbn [5] has been 
shown to be evolutbnary related to those of GSHPx*s. In filarial nematode parasites such as Brugia pahangi the major 
soluble cuticular prt^ein, known as gp29, is a secreted GSHPx which couW provide a mechanism of resistance to the 
immune reaction of the mammalian host by neutralizing the products of the oxbative burst of leukocytes [6].Escherichia 
coll protein btuE. a periptasmb protein involved In the transport of vitamin B1 2, is also evolutbnary related to GSHPx*s; 
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the significance of this relationship is not yet clear. Selenium, in the form of selenocystetne [7] is part of the catalytic 
site GSHPx. The sequence around the selenocysteine residue is moderately well conserved in GSHPx's and the 
related proteins and can be used as a signature pattern. As a second signature for this family of proteins a highly 
consented octapeptide located in the central section of these proteins was selected. 
s [0672] Consensus pattem: [GN]4RKHNFYChx4LIVMFCHLI VMF](2)-x-N4\ni-x4STChx-C^^ [C is the active 
site selenocystetne residue] 
Consensus pattem: [UVHAGD]-F-P-[CSHNG]^ 

[ 1] Mannervik B. Meth. Enzymol. 113:490-495(1985). 
10 1 2] Mullenbach G.T, Tabriz! A.. Irvine B.D., Bell G.I., Tainer J.A., Hallewell R.A. Protein Eng. 2:239-246(1988). 

1 3] Chu F.F., Doroshow J.H., Esworthy RS. J. Biol. Chem. 268:2571-2576(1993). 

[ 4] Takahashi K.. Akasaka M., Yamamoto Y.. Kobayashi C, Mizoguchi J.. Koyama J. J. Biochem. 108:145-148 
(1990). 

[ 5\ Dunn D.K.. Howells D.D.. Richarcteon J., Gddfarto P.S. Nucleic Acids Res. 17:6390-6390(1989). 
IS 1 6] Cookson E., Blaxter M.L, Selkirk M E. Proc, Natl. Acad. Sci. U.S.A. 89:5837-5841(1992). 

[ 7] Stadtman TC. Annu. Rev. Biochem. 59:111-127(1990). 

[0673] 225. (GST) 
Glutathk)ne S-transferases 

20 [0674] Function: conjugatwn of reduced glutath»ne to a variety of targets. Also included in the alignment, but are 
not GSTs S-crystallins from squki. Similarity to GST was prevbusly noted. Eukaryotw elongatkxi factors 1 -gamma. 
Not known to have GST activity; similarity not prevfously recognized. Supported by HMM and manual alignment in- 
spection. HSP26 family of stress-related proteins, irrcluding auxin-regulated proteins in plants and stringent stawation 
proteins in E. coli. Not known to have GST activity. Similarity not prevbusly recognized. Supported by HMM and manual 

^ alignment inspectbn. Alignment spans entire protein. 
[0675] 226. GTP1/OBG family signature 

A widespread family of GTP-binding proteins has been recently characterized [1,2]. This family currently includes: - 
Mouse and Xenopus protein DRG. - Human protein DRG2. - Drosophila protein 128up. - Fission yeast protein gtpl . - 
A Hak>bacterium cutirubrum hypothetical protein in a ribosomal protein gene cluster - Bacillus subtilis protein obg. 

30 Obg has been experimentally shown to bind GTR - Escherichia coli hypothetical protein yhbZ. - Haemophilus influenzae 
hypothetbal protein HI0877. - Mycoplasma genitalium hypothetical protein MG384. - Yeast hypothetfeal protein 
YAL036C (FUN11 ). - Yeast hypothetbal protein YGR173w. - Caenorhabditis elegans hypothetical protein C02F5.3.The 
functbn of the proteins that bebng to this family is not yet known. They are polypeptides of about 40 to 48 Kd which 
contain the five small sequence elements characteristfc of GTP-binding proteins [3]. As a signature pattem the region 

35 that correspond to the ATP/GTP B motif (also called G-3 inGTP-binding proteins) was selected. 
[067q Consensus pattem: D-[UVMJ-P-G-[LIVM](2)-[DEY]-[GN]-A-x(2)-G-x-G - 

[ 1] Sazuka T, Tomooka Y, Ikawa Y, Noda M., Kumar S. Biochem. Bfophys. Res. Commun. 189:363-370(1992). 
[ 2] Hudson J.D.. Young RG. Gene 125:191-193(1993). 
^ 1 3] Bourne H.R, Sanders DA, McCormfck R Nature 349:117-127(1991). 

[0677] 227. (GTP_EFTU1) 
ATP/GTP-binding site motif A (P-kwp) 

[0678] From sequence comparisons and crystal tographc data analysis it has been shown [1 .2,3,4.5,6] that an ap- 
45 preciable proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. 
The best conserved of these motifs is a glycine-rlch regk>n, which typically forms a flexible loop between a beta-strand 
and an alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
generally referred to as the W consensus sequence {1) or the P-loop' [5] There are numerous ATP- or GTP-binding 
proteins m whfch the P-ksop is found. Listed betow are a number of protein families for whbh the relevance of the 
so presence of such motif has been noted: - ATP synthase alpha and beta subunits (see <PDOC00137 >V - Myosin heavy 
chains. - Kinesin heavy chains and kinesin-like proteins (see <PDOC00343 >). - Dynamins and dynamin-like proteins 
(see <PDOC00362>). - Guanylate kinase (see <PDOC00670>). - Thymidine kinase (see <PDQC00524 >1. - Thyml- 
dylate kinase (see <PDOC01034 >). - Shikimate kinase (see <PDOC00868 >). - Nitrogenase iron protein family (nifH/ 
fncC) (see <PDOC00580 >). - ATP-binding proteins involved in 'active transport' (ABC transporters) [7] (see 
ss <PDOC00185>). - DNAand RNAhelicases [8,9,10]. - GTP-binding elongation factors (EF-Tu, EF-lalpha, EF-G. EF- 
2, etc.). - Ras family of GTP-binding proteins (Ras. Rho, Rab, Ral. Ypti, SEC4, etc.). - Nuclear protein ran (see 
<PDOC00859>). - ADP-ribosylation factors family (see <PDOC00781 >1. - Bacterial dnaA protein (see <PDOC00771 >V 
- Bacterial recA protein (see <PDOC00131> ). - Bacterial recF protein (see <PPOC00539 >). - Guanine nucleotide- 
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binding proteins alpha subunrts (Gi, Gs, Gt. GO. etc.). - DNA mismatch repair proteins mutS family (See 
<PDOC00388>1 - Bacterial type II secretion system protein E (see <PDOC00567>).Not all ATP- or GTP-binding pro- 
teins are picked-up by this motif. A number of proteins escape detection because the structure of their ATP-binding 
site is completely different from that of the P-loop. Examples of such proteins are the El -E2 ATPases or the glycolytic 
kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a slightly different form; this is the case for 
tubulins or protein kinases. A special mentwn must be resewed for adenylate kinase, in whk:h there is a single deviatbn 
from the P-loop pattern: in the last posltnn Gty is found instead of Ser or Thr. 

- Consensus pattem: [AG]-x(4)-G-K-[ST]- 

[ 1] Walker J.E.. Saraste M.. Runswick M.J.. Gay N.J. EMBO J. 1:945-951(1982). 
[ 2] Moller W.. Amons R FEES Lett. 186:1-7(1985). 

[ 3] Fry D.C., Kuby S.A., MiMvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 
[ 4] Dever T.E.. Glynias M.J.. Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 
[ 5] Saraste M., SibbaW PR, Wittinghofer A. Trends Bkx:hem. Sci. 15:430-434(1990). 
[ 6] Koonin E.V. J. Mol. Biol. 229:1165-1174(1993). 

[7] Higgins C.F, Hyde S.C.. Mimmack M.M.. Gileadi U.. Gill D.R., Gallagher M.P J. Bfoenerg. Biomembr. 22: 
571-592(1990). 

[ 8] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 

[ 9] Linder R, LaskoR. Ashbumer M.. Leroy P., Nielsen RJ., Nishi K., Schnier J., Slonimski RR Nature 337:121-122 
(1989). 

[10] Gorbalenya A.E.. Koonin E.V., Donchenko A.R, Blinov V.M. Nuclefc Acids Res. 17:471^4730(1989). 
[0679] GTP-binding etongation factors signature (GTP_EFTU2) 

Ebngaton factors [1 .2] are proteins catalyzing the ebngation of peptide chains in protein biosynthesis. In both prokary- 
otes and eukaryotes, there are three distinct types of elongatton factors, as described in the foltowing table: 

Eukaryotes Prokaryotes Functkm 

EF-1 alpha EF-Tu Binds GTP and an aminoacyl-tRNA; deliv- 
ers the latter to the A site of ribosomes. EF-lbeta EF-Ts Interacts with EF-la/EF-Tu to displace GDP and thus allows 
the regencratwn of GTP-EF-la. EF-2 EF-G Binds GTP and peptidyl-tRNA and translocates the latter from the A site 

to the P site. -The GTP-binding elongatkxi factor family also 

includes the foltowing proteins: - Eukaryotic peptide chain release factor GTP-binding subunits [3]. These proteins 
interact with release factors that bind to ribosomes that have encountered a stop codon at their decoding site and help 
them to nnduce release of the nascent polypeptkJe. TTie yeast protein was known as SUP2 (and also as SUP35. SUF 1 2 
or GST1) and the human honnotog as GSTI-Hs. - Prokaryotic peptide chain release factor 3 (RF-3) (gene prfC). RF- 
3 is a class-ll RF, a GTP-binding protein that interacts with class I RFs (see <PDCXX)0607 >) and enhance their activity 
[4]. - Prokaryotic GTP-binding protein lepA and its homotog in yeast (gene GUF1) and in Caenorhabditis elegans 
(2K1236.1). - Yeast HBS1 [5]. - Rat statin SI [6], a protein of unknown function whteh is highly similar to EF-lalpha - 
Prokaryotc selenocysteine-spedfic etongatton factor selB [7], which seems to replace EF-Tu for the insertion of se- 
lenocysleine directed by the UGA codon. - The tetracycline resistance proteins tetM/tetO [8.9] from various bacteria 
such as Campylobacter jejuni. Enterococcus faecalis. Streptococcus mutans and Ureaplasma urealytcum. Tetracycline 
binds to the prokaryotic ribosomal SOS subunit and inhibits binding of aminoacyl-tRNAs. These proteins abolish the 
inhibitory effect of tetracycline on protein synthesis. - Rhizobium nodulation protein nodQ [10]. - Escherk:hia coli hy- 
polhelk»l protein yihK [11].In EF-1naIpha, a specific regton has been shown [12] to be involved in a conformational 
change mediated by the hydrolysis of GTP to GDP This region is consented in both EF-lalpha/EF-Tu as well as EF- 
2/EF-G and thus seems typical for GTP-dependent proteins which bind non-initiator tRN As to the ribosome. The pattem 
developed for this family of proteins include that conserved region. 

[0680] Consensus pattem: D-IKRSTGANQFYW}-x(3)-E-[KRAQ]-x-[RKQD]-[GCHIVMK]-{ST|- [IV]-x(2)-rGSTACK- 
RNQ]. 

[ 1] Concise Encyctopedia Bkxhemistry, Second Edition, Walter de Gmyter, Berlin New- York (1988). 
[ 2] MoWave K. Annu. Rev. Bk)chem. 54:1109-1149(1985). 

[ 3] StansfieW I.. Jones K.M.. Kushnirov V.V.. Dagkesamanskaya A.R., Poznyakovski A.I.. Paushkin S.V.. Nierras 
C.R, Cox B,S., Ter-Avanesyan M.D., Tuite M.R EMBO J. 14:4365-4373(1995). 

[ 4] Grentzmann G., Brechemier-Baey D.. Heurgue-Hamard V., Buckingham R.H. J. Biol. Chem. 270:10595-10600 
(1995). 

[ 5) Nelson RJ., Ziegelhtrffer T, Nicolet C. Werner-Washbume M.. Craig E.A. Cell 71:97-105(1992) . 

[ 6] Ann D.K.. Moutsatsos I.K.. Nakamura T. Un H.H.. Mao R-L. Lee M.-J.. Chin S.. Liem RK.H., V\feing E. J. Btol. 
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Chem. 266:10429-10437(1991). 

1 7] Forchammer K., LelnteWr W.. BcKk A. Nature 342:453-456(1989). 
[ 8] Manavathu E.K., Hiratsuka K., Taylor D.E. Gene 62:17-26(1988). 

[ 9] Leblanc D.J., Lee LN.. Trtmas B.M.. Smith CJ., Tenover RC. J. Bacteriol. 170:3618-3626(1 988). 
s [10] Cervantes E., Shamna S.B., Maillet R. Vasse J.. Truchet G.. Rosenberg C. Md. Microbiol. 3:745-755(1989). 

[11] Plunkett G. Ill, Burland V.D.. Daniels D.L. Blattner RR. Nuclec Acids Res. 21:3391-3398(1993). 
[12] Moller W., Schlpper A., Amons R. Bk)chlmie 69:983-989(1987). 

[0681] 228. GTP cycbhydrolase tl. 
10 GTP cyctohydrolasG II catalyses the first committed step In the biosynthesis of riboflavin. 

[0682] [1 ] Richter G, Ritz H. Katzenmeier G. Volk R, Kohnle A. Lottspeich R Allendorf D. Bacher A. J Bacterbl 1 993 
175:4045-4051. 

[0683] 229. Galactose-1 -phosphate urkiyi transferase signatures (GalP_UDP_transf) 

Galaclose-1 -phosphate uridyl transferase (EC 2.7.7.10) (galT) catalyzes the transfer of an uridyldiphosphate group on 
IS galactose (or glucose) 1 -phosphate. During the reaction, the urkJy I nrwiety links to a histidine residue. In the Escherichia 
coli enzyme, it has been shown [1] that two histidine residues separated by a single proline residue are essential for 
enzyme activity. On the basis of sequence similarities, two apparently unrelated families seem to exist. Class-I enzymes 
are found in eukaryotes as well as some bacteria such as Escherichia coli or Streptomyces llvidans, while class-ll 
enzymes have been found so far only in bacteria such as Bacillus subtllis or Lactobacillus helvetkjus [2]. Signature 
20 patterns for both families were devetoped. Ror class-l enzymes the signature is based on the active site residues. Por 
class-ll enzymes a region whk^h also includes two consented histkilnes was chosen. 
Consensus pattern: R-E-N-(RK]-G-x(3)-G-x(4)-H-P-H-x-Q [The two H's are the active site residues]- 
[0684] Consensus pattern: D-L-P-I-V-G-G-[ST]-[LIVM](2)-[SA]-H-[DEN]-H-[FY]-Q-G-G- Note: class-l enzymes are 
structurally related to the HIT family of proteins (see <PDOC00694 

2S 

1 1] Rek:hardt J,K.V., Berg P. Nucleic Ackis Res. 16:9017-9026(1988). 
[ 2] Mollet B.. Pilk>ud N. J. Bacteriol. 173:4464-4473(1991). 

[0685] 230. Qamma-thionins family signature 
30 [06861 The foltowing small plant proteins are evotutionary related: 

- Gamma-thionlns from wheat endospemi (gamnr)a-purothk)nlns) and bariey (gamma- hordothbnins) which are toxic 
to animal cells and inhibit protein synthesis in cell free systems [1]. 

- A flower-speclfk: thkxiin (RST) from tobacco [2]. 

3S - Antifungal proteins (ARP) from the seeds of Brasskraceae species such as radish, mustard, turnip and Arabidopsis 
thaliana [3]. 

Inhibitors of Insect alpha-amylases from sorghum [4]. 
Probable protease inhibitor P322 from potato. 

- A germinatkxi-related protein from cowpea [5]. 

40 - Anther-specific protein SR18 from sunflower [6]. SP18 is a protein that contains a gamma^hfonln dbmain at its N- 
terminus and a proline-rich C- terminal domain. 
Soybean sulfur-rich protein SE60 [7]. 

- Vtcia faba antibacterial peptkies fabatin-1 and -2. 

45 [0687] In their mature form, these proteins generally consist of about 45 to 50amino-ackj residues. As shown In the 
foltowing schematc representation, these peptides contain eight conserved cysteines involved in disulfide bonds. 



50 



55 



H ^,11,, 

XXCXXXXXXXXXXCXXXXXCXXXCXXXXXXXXXCXXXXXXQCCXM I 

+1^ + 

'C: conserved cysteine involved in a disulfide bond, 
position of the pattern. 



[0688] Consensus pattern: [KRG]-x-C-x(3)-[SV]-x(2)4RYWH]-x-[GR]-x-C-x(5)-C-x(3)-C [The four C's are involved in 
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disulfide bonds]- 

[1] Bruix M.. Jimenez MA. Santoro J.. Gonzalez C. Colilla F.J., Mendez E.. Rico M. Biochemistiy 32:715-724 
(1993). 

5 [2] Gu Q., Kawata E.E., Morse M.-J., Wu H.-M., Cheung A.Y. Mol. Gen. Genet. 234:89-96(1992). 

[3] Terras F.R.G., Torrekens S.. van Leuven R, Osbom R.W.. V^erteyden J.. Cammue B.RA., Broekaert W.F. 
FEBS Lett. 316:233-240(1993). 

[4] Bkxh C. Jr.. Rfchardson M. FEBS Lett. 279:101-104(1991). 
[5] Ishibashi N., Yamauchi D.. Miniamikawa T. Plant Mol. Biol. 15:59-64(1990). 
10 [7] Choi Y.. Choi YD.. Lee J.S. Plant Physk>l. 101:699-700(1993). 

[0689] 231 . Gelsolin. Gelsolin repeat Number of members: 170 

[0690] [IjMedline: 97433077. The crystal structure of plasma gelsolin: implcatkxis for actin severing, capping, and 
nucleation. Burtnck LD, Koepf EK, Grimes J, Jones EY Stuart Dl, McLaughlin PJ, Robinson RC; Cell 1997;90:661-670. 

IS [0691] 232. Germin family signature 

Germins [1] are a family of honrK)pentan)erk: cereal glycoproteins expressed during germinatbn which may play a role 
in altering the properties of cell walls during germlnative growth. It has been shown that wheat and barieygermins act 
as oxalate oxidases (EC 1.2. 3. 4) . an enzyme that catalyzes the oxidative degradation of oxalate to cartxxiate and 
hydrogen peroxide. Germins are highly similar to: - Germin-like proteins from vark>us plants such as rape, violet or 

20 white mustard. - Slime mold sphemlins la and 1 b which are proteins that accumulate specifically during spherulatkx), 
a process induced by various forms of environmental stress whk:h leads to encystment and dormancy. As a signature 
pattem the best conserved region was selected: a decapeptkie k>cated in the central sectnn of these proteins. 
[0692] Consensus pattern: G-x(4)-H-x-H-P-x-A-x-E-[UVM]- 
[0693] [1 ] Lane B.G. FASEB J. 8:294-301 (1 994). 

2S [0694] 233. (GlutR) 

Glutamyl-tRNA reductase signature 

[0695] Delta-aminolevulinic acki (ALA) is the obligatory precursor for the synthesis of all tetrapyrrdes including por- 
phyrin derivatives such as chlorophyll and heme. ALA can be synthesized via two different pathways: the Shemin (or 
C4) pathway which Involves the single step condensation of succinyl-CoA and glycine and whch is catalyzed by ALA 

30 synthase (EC 2.3. 1 .37) and via the CSpathway from the five-carbon skeleton of glutamate. The C5 pathway operates 
in the chk>roplast ol plants and algae, in cyanobacteria, in some eubacteria and in archaebacteria. 
[0696] The initial step In the C5 pathway is earned out by glutamyl-tRNA reductase (GluTR) [1] whk:h catalyzes the 
NADP-dependent converskvi of glutamate- tRNA(Glu) to glutamate-l-semiaWehyde (GSA) with the concomitant re- 
lease of tRNA(Glu) whfch can then be recharged with glutamate by glutamyl-tRNA synthetase. 

3S [0697] GluTR is a protein of about 50 Kd (467 to 550 reskJues) which contains a few consented regkxi. The best 
consen/ed regkxi is kx:ated in positk)ns 99 to 122 in the sequence of known GluTR. This regbn seems important for 
the activity of the enzyme. We have developed a signature pattem from that consen/ed region. 
[0698] Consensus pattemH-ILIVM>x(2)-[UVM]-[GSTAC](3)-[UVM]-[DEQ]-S-[UVMA]-[LIVM](2)-(GF]-E-x-[EQR]- 
[I VHUTHSTAG}0-IUVM]-(KR1 Sequences known to bekxig to this class detected by the pattem ALL. 

40 [0699] [1] Jahn D., Verkamp E., Soell D. Trends Bkxrhem. Scl. 17:215-218(1992). 
[0700] 234. (Glycoprotease) 
Glycoprotease family signature (aka Peptkiase_M22) 

[0701] Glycoprotease (GCP) (EC 3.4.24.57) [1], oro-syatoglycoprotein endopeptkJase, isa metalloprotease secreted 
by Pasteurella haemolytk:a whfch specifkally cleaves O^ialoglycoproteins such as glycophorin A. The sequence of 
4S GCP is highly similar to the following undiaracterized proteins: 

Escherichia coli hypothetk:al protein ygjD (ORF-X). 
Bacillus subtilis hypothetk:al protein ydiE. 

- Mycobacterium leprae hypothetk»l prc^ein U229E. 

so - Mycobacterium tubercutosis hypothetk»l protein MtC Y78. 1 0. 

- Synechocystis strain PCC 6803 hypothetk:al protein si r0807. 

- MetharKxx)ccus jannaschii hypothetk:al protein MJ1 1 30. 

- Hak)arcula marismortui hypothetksi protein in HSH 3' region. 

- Yeast hypothetcal protein YKR038c. 
ss - Yeast hypothetk^l protein QRI7. 

[0702] One of the conserved regkxis contains two conserved histklines. It is possible that this regkxi is involved in 
coordinating a metal bn such as zinc. 
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[0703] Consensus pattern[KR]4GSAT]-x(4HFYWLH]4DQNGKhx-P-x^LIVMFY]-x(3)-H-x(2HAG]-H-^ Se- 
quences known to belong to this class detected by the pattern ALL 
[0704] Note: these proteins belong to family M22 in the classification of peptidases [2,E1]. 

s 1 1] Abdullah K.M., Lo R.Y.C., Mellors A. J. BacterioL 173:5597-5603(1991). 

[ 2] Rawlings N.D., Barrett A.J. Meth. Enzynnl. 248:183-228(1995). 

[0705] 235. (Glucosamine.iso) 

Gluco6amine/galactosamine-6-phosphate isomerases signature 

10 Glucosaniine-6-phosphate isomerase (EC 5.3.1.10 ) (or Glc-6-P deaminase) is the enzyme responsible for the conver- 
sion of glucosamine a-phosphate into fructoses phosphate [1 ]. It is the last specific step in the pathway for N-acetylglu- 
cosamine (GlcNAC) utilization in bacteria such as Escherichia coli (gene nagS) or in fungi such as Candida albicans 
(gene NAG1).Glc-6*P isomerase is evolutionary related to: - A putative Escherichia coli gafactosamine^hosphate 
isomerase (gene agal) [2]. - Escherichia coll hypothetical protein yieK. - Bacillus subtilis hypothetical protein ybfT As 

IS a signature pattem a conserved region located In the central part of these enzymes was selected. This region contains 
a consented histidine which has been shown [1], in nagB. to be important for the pyranose ringopening step of the 
catalytic mechanism 

[OTOq Consensus pattern: [UVM]-x(3)-G-x^LIT>x^LIVl-x-[UVM]-x-G^LI VM]-G-x- [DEN]-G-H- 

20 1 1] oliva G., Pontes M.RM.. Garratt RC, Allamirano M.M., Calcagno M.L. Horjales E. Structure 3:1323-1332 
(1995). 

[ 2] Reizer J.. Ramseier T.M., Reizer A.. Charbit A., Saier M.H. Jr. Microbiology 142:231-250(1996). 

[0707] 236. Pneumovirus attachnr»nt glycoprotein G (glycoprotein G) 

2S [0708] This family includes attachment proteins from respiratory synctial virus. Glycoprotein G has not been shown 
to have any neuraminidase or hemagglutinin activity (Swiss-Prot). The amino terminus is thought to be cytoplasmic, 
and the carboxyl terminus extracellular. The extracellular region contains four completely conserved cysteine residues. 
[0709] [1] Johnson PR, Spriggs MK. Olmsted RA. Collins PL. Proc Natl Acad Sci U S A 1987;84:5625-5629. 
[0710] 237. Glycosyt transferases group 1 

30 [0711] Mutations in this domain of Swiss: P37287 lead to disease (Paroxysmal Nocturnal haemoglobin uria). Members 
of this family transfer activated sugars to a variety of substrates, including glycogen Fructose-6-phosphate and llpopol- 
ysaccharldes. Members of this family transfer UDP, ADP, GDP or CMP linked sugars. The eukaryotic glycogen syn- 
thases may be distant members of this family. 
[0712] 238. Glycosyl transferases (Glycos_transf_2) 

35 [071 3] Diverse family, transferring sugar from UDP-glucose, UDP-N-acetyl-galactosamlne, GDP-mannose or CDP- 
abequose, to a range of substrates including cellulose, dolichol phosphate and teichoic acids. 
[0714] 239. (Glucos_transf_3) 

Thymidine and pyrimidine-nucleoside phosphoiylases signature 

[0715] Thymidine phosphorylase (EC 2.4.2.4) catalyzes the reversible phosphorolysis of thymidine, deoxyuridine 
40 and their analogues to their respective bases and 2-deoxyribose 1 -phosphate. This enzyme regulates the availability 
of thymidine and is therefore essential to nucleic acid rr«tak)olism. 

[0716] In Escherichia coli (gene deoA), the enzyme is a dimer of Identical subunits of about 48 Kd [1], In humans it 
was first identified as platelet-derived endothelial cell growth factor (PD-ECGF) [El] before being recognized [2] as 
thymidine phosphorylase. 

45 [0717] Bacterial pyrimidine-nucleoside phosphorylase (EC 2.4.2.2) (gene pdp) [3] is an enzyme evolutionary and 
structurally related to thymidine phosphorylase. 

[071 8] A a well consented region of 1 9 residues located in the N-terminal part of these proteins signature pattem for 
these enzymes was selected. 

[07191 Consensus pattemS4GS]-R-(GA]-{Liy|-x(2)^[TAJ-(GA]-G-T-x-D-x-(LIVJ-E Sequences known to belong to this 
so class detected by the pattem ALL 

[ 1] Walter M R.. Cook W.J., Cole LB.. Short S.A.. Koszalka G.W.. Krenitsky TA., Ealick S.E. J. Biol. Chem. 265: 
14016-14022(1990). 

[ 2J FunjkawaT Yoshimura A.. SumizawaT, Haraguchi M., Akiyama S.-L, Fukui K.. Yamada Y. Nature 356:668-668 
ss (1992). 

1 3] SaxikJ H.H., Andersen LN., Hammer K. J. Bacterid. 178:424-434(1996). 
[0720] 240. Glycos_transf_4. Glycosyl transferase. Number of nrtembers: 44. 
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[07211 [1 1 Medline: 95252686. A family of UDP-GlcNAc/MurNAc: polyisoprenol-P GlcNAc/MurNAc-1 -P transferases. 

Lehrman MA; Glycobiology 1994;4:768-771. 

[0722] 241 . Glycosyl hydrolases family 1 5, 21 members. 

[0723] 242- Glycosyl hydrolases family 1 6 signature 

s It has been shown [1] that the following glycosyl hydrolases can be classified into a single family on the basis of 
sequence similarities: - Bacterial beta-1 .3-1 ,4-glucanases. or lichenases, (EC 3.2.1.73) mainly from Bacillus but also 
from Clostridium thermocellum (gene licB), Fibrobacler succinogenes and Rhodothemius marinus (gene bgIA). - Ba- 
cillus circulans beta-1. 3-glucanase A1 (EC 3.2.1.39 ) (gene gIcA). - l.amarinase (EC 3.2.1.6 ) from Clostridium thermo- 
cellum (gene lami). - Streptomyces coelicolor agarase (EC 3.2.1.81) (gene dagA). - Alteromonas canageenovora 

10 kappa-carrageenase (EC 3.2.1.83) (gene cgkA).Two closely clustered conserved glutamates have been shown [2] to 
be involved in the catalytic activity of Bacillus lichenifonmis lichenase. The region was used that contains these residues 
as a signature pattern. 

[0724] Consensus pattern: E-[LIVl-D-[UV]-x(0,1)-E-x(2HGQHKRNF]-x-(PSTA] [The two E's are active site resl- 
dues]- 

15 

1 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Juncosa M.. Pons J.. Dot T, Querol E.. Planas A. J. Biol. Chem. 269:14530-14535(1994). 

[0725] 243. Glycosyl hydrolases family 1 7 signature 

20 It has been shown [1.2] that the foltowing glycosyl hydrolases can be classified into a single family on the basis of 
sequence similarities: - Glucan endo-1,3-beta-glucosidases (EC 3.2.1.39) (endo-(1->3)-beta-glucanase) from various 
plants. This enzyme may be involved in the defense of plants against pathogens through its ability to degrade fungal 
cell wall polysaccharides. - Glucan 1.3-beta-glucosidase (EC 32.1.58 ) (exo-(1->3)-beta-glucanase) from yeast (gene 
BGL2). This enzyme may play a role in cell expansion during growth, in cell-cell fusion during mating, and in spore 

25 release during spomlation. - Lichenases (EC 3.2.1.73) (endo-(1->3,1->4)-beta-glucanase) from various plants. The 
best consented region in the sequence of these enzymes is located in their central section. This region contains a 
conserved tryptophan residue which could be involved In the Interaction with the glucan substrates [2] and it also 
contains a consented glutamate which has been shown (3] to act as the nucleophile in the catalytic mechanism, this 
region was used as a signature pattern. 

30 Consensus pattern: [UVM]-x-(UVMFYWA](3)-(STAGhE-[STA]-G-W-P-[STN]-x-(SAGQ] [E is an active site residue]- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] On N.. Sessa G., Lotan T. Himmelhoch S., Fluhr R. EMBO J. 9:3429-3436(1990). 
( 3] Varghese J.N., Garrett TRJ.. Colman RM., Chen L. Hoj RJ.. Fincher G.B. Proc. Natl. Acad. Sci. U.S A. 91 
35 2785-2789(1994). 

[0726] 244. Glyoxalase I signatures 

Glyoxalase 1 (EC 4.4.1.5) (lactoylglutathione lyase) catalyzes the first step of the glyoxal pathway, the transformation 
of methylglyoxal and glutathionetnto S-lactoylglutathione which is then converted by glyoxalase 11 to lactic acid [1]. 

40 Glyoxalase I is an ubiquitous enzyme which binds one mole of zinc per subunrt. The bacterial and yeast enzymes are 
monomeric while the mammalian one is honrwdlmeric. The sequence of glyoxalase I is well consented. In bacteria and 
mammals, the enzyme is a protein of about 1 30 to 1 80 residues while in fungi it is about twice longer. In these organisms 
the enzyme is built out of the tandem repeat of an homologous domain. Two signature patterns for this family were 
derived. The first one is located in the N-terminal region while the second one is located in the central section of the 

45 protein and contains a consented histidine that could be implicated in the binding of the zinc atom. 

[0727] Consensus pattern: [HQHIVT|-x4LIVFY]-x-[IV]-x(5)-[STA]-x(2)-F-[YM]-x(2,3)-{LMF]-G-[LMF]- 
Consensus pattem: G-[NTKQJ-x(0.5)-[GA]-[LVFY]-[GHl-H-[IVF]-[CGAJ-x-[STAGLE]-x(2)-[DNC]- 
[0728] [ 1] Kim N.-S., Umezawa Y. Ohmura S., Kato S. J. Biol. Chem. 268:11217-11221(1993). 
[0729] 245. (Glypican) 

so Gtypicans signature 

Glypicans [1,2] are a family of heparan sulfate proteoglycans which are anchored to cell membranes by a glycosyl- 
phosphatldylinositol (GPI) Knkage. Structurally, these proteins consist of three separate domains: 

a) A signal sequence; 

ss b) An extracellular domain of about 500 residues that contains 1 2 consen/ed cysteines probably involved in disulfide 

bonds and which also contains the sites of attachment of the heparan sulfate glycosaminoglycan side chains; 
c) A C-terminal hydrophobic region which is post-translationally removed after formation of the GPI-anchor. 
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[0730] The proteins known to belong to this family are: 

- Glypican 1 (GPC1). 

- G Vplcan 2 (GPC2) or cerebrogfycan. 

- Glypican 3 (GPC3) or OCI-5. In man, defects In GPC3 are the cause of a X-llnked genetc disease, Slmpson- 
Galabi-Behmel syndrome (SGBS). 

K-gtypcan. 

- Glypican 5 (GPC5). 
Drosophiia protein dally. 

[0731] The signature pattern that was devetoped for glyplcans is located in the central section of the extracellular 
domain and contains five of the conserved cysteines. 

[07321 Consensus panemC-x(2)-C-x-G-[LIVM]-x(4)-P-C-x(2HFY]-C-x(2)-[U VM]-x(2)- G-C [The C's are probably in- 
volved in a disulfide bonds] Sequences known to belong to this class detected by the pattern ALL, except for dally 

[ 1] Weksberg R., Squire J.A., Templeton D.M. Nat Genet. 12:225-227(1996). 
[ 2] Watanabe K.. Yanrada H., Yamaguchi Y J. Cell Bk>l. 130:1207-1218(1995). 

[0733] 246. Granins signatures 

Granins (chromogranins or secretogranins) [1] are a family of acidic proteins present in the secretory granules of a 
wide variety of endocrine and neuro-endocrine cells. The exact f unctk)n(s) of these proteins is not yet known but they 
seem to be the precursors o» blok)gically active peptides andtor they may act as helper proteins in the packaging of 
peptide hormones and neuropeptides. Three members of this family of proteins show some sequence similarities: - 
Chromogranin A (CGA) [2]. CGA is a protein of about 420 residues; it is the precursor of the peptide pancreastatin 
which strongly inhibits glucose-induced insulin release from the pancreas. - Secretogranln 1 (chromogranin B). A sul- 
fated protein of about 600 residues. - Secretogranln 2 (chromogranin C). A sulfated protein of about 650 residues. 
Apart from their subcellular location and the abundance of acidc reskjues(Asp and Glu). these proteins do not share 
many structural similarities. Only one short regfon, kx:ated in the C-terminal section, is conserved in all these proteins. 
Chromogranins A and B share a region of high similarity in their N4erminal section; this region includes two cysteine 
residues involved in a disulfide bond 

[0734] Consensus pattem: [DEHSN]-L-[SAN]-x(2)-[DE]-x-E-L- 

Consensus pattem: C-[LI VM](2)-E-[UVM](2)-S-[Drfl-[STA]-L-x-K-x-S-x(3)- [U VM]-[STA]-x-E-C [The two C's are linked 
by a disulfide bond]- 

[ 1] Huttner W.B., Gerdes H.-H., Rosa R Trends Biochem. Scl. 16:27-30(1991). 
[ 2] Simon J.-R, Aunis D. Biochem. J. 262:1-13(1989). 

[0735] 247. grpE protein signature 

In prokaryotes the grpE protein [1] stimulates, jointly with dnaJ, the ATPase activity of the dnaK chaperone. It seems 

to accelerate the release of ADP from dnaK thus altowing dnaK to recycle more efficiently. GrpE is a protein of about 

22 to 25 Kd. In yeast, an evolutkxiary related mitochondrial protein(gene GRPE) has been shown [2J to associate with 

the mitochondrel hsp70protein and to thus play a role in the import of proteins from the cytoplasm. As a signature 

pattem, the most consented region of grpE was selected. It is kx:ated in the C-terminal section. 

[0736] Consensus pattem: [FLP[DNHPHEAJ-x(2)-[HM]-x-A-[UVMTN]-x(16,20)-G-(FY]- x(3)-[DEG]-x(2)-[LIVM]- 

[RI]-x-(SA]-x-V-x-[IVl- 

[ 1] Georgopoulos C, Weteh W. Annu. Rev. Cell Biol. 9:601-635(1993). 

[ 2] Bolliger L. Deloche O., Gfck B.S., Georgopoulos C. Jenoe R. KronkJou N., Horst M., Morishima N.. Schatz 
G. EMBO J. 13:1998-2006(1994). 

[0737] 248. Guanylate kinase signature and profile 

Guanylate kinase (EC 2.7.4.8 ) (GK) [1] catalyzes the ATP-dcpendent phosphorylatton of GMP into GDR It is essential 
for recycling GMP and indirectly, cGMR In prokaryotes (such as Escherichia coli). lower eukaryotes (such as yeast) 
and in vertebrates, GK is a highly conserved monomeric protein of about 200 amino ackls. GK has been shown [2,3,4] 
to be stmcturally similar to the foltowing proteins: - Protein A57R (or SalG2R) from varraus strains of Vbccinia virus. 
This protein is highly similar to GK, but contains a f rameshrft mutatbn In the N-terminal section and couW therefore be 
inactive in that virus. The foitowing proteins are characterized by the presence in their sequence of one or more copies 
of the DHR domain, a SH3 domain (see <PDOC50002 > as well as a C-terminal GK-like domain, these protein are 
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collectively termed MAGUKs (membrane-associated guanylate kinase homologs) [5]: - Drosophila lethal(1 )discs large- 
1 tumor suppressor protein (gene digl ). This protein is associated with septate junctions in developing flies and defects 
in the digl gene cause neoplastic overgrowth of the imaginal disks. - Mammalian tight junction protein Zo-1. - A family 
of mammalian synaptic proteins that seem to interact with the cytoplasmic tail of NMDA receptor subunits. This family 

5 currently consst of SAP90/PSD-95. CHAPSYN-110/PSD-93. SAP97/DLG1 and SAP102. - Vertebrate 55 Kd erythro- 
cyte membrane protein (p55), p55 is a palmitoylated, membrane^ssociated protein of unknown f unctwn. - Caenorhab- 
ditis elegans protein lin-2, which may play a structural role in the induction of the vulva. - Rat protein CASK. - Human 
protein DLG2. - Human protein DLG3.There is an ATP-binding srte (P-loop) in the N-terminal section of GK. This region 
is not consen/ed In the GK-like domain of the above proteins which are therefore unlikely to be kinases. However these 

10 proteins retain the residues known, in GK, to be involved in the binding of GMR As a signature pattern a highly consented 
region was selected that contains two arginine and a tyrosine which are involved in GMP-binding 
[0738] Consensus pattern: T4ST]-R-x(2)4KR]-x(2)4DE]-x{2)-G-x(2)-Y-x-[FYHLIVMKJ- 

[ 1] StehleT, Schuiz G.E. J. Mol. Btol 224:1127-1141(1992). 
IS 1 2] Bryant PJ.. Woods D.R Ceil 68:621-622(1992). 

[ 3] Goebl M.G. Trends Biochem. Sd. 17:99-99(1992). 

[ 4] Zschocke PD., Schiltz E., Schuiz G.E. Eur. J. Bkx:hem. 213:263-269(1993). 

[ 5] Woods D.F.. Bryant P.J. Mech. Dev. 44:85-89(1994). 

20 [0739] 249. (Glyco_hydro_35) 

Glycosyl hydrolases family 35 putative active site 

[0740] Beta-galactosidases (EC 3.2.1 .23) from mammals, fungi, plants and the bacteria Xanthomonas manihotis are 
evolutfonary related [1 ,2]. They bekxig to family 35 in the classification of glycosyl hydrolases [3.E1 ]. 
[0741] Mammalian beta-galactosidase is a lysosomal enzyme (gene GLB1) which cleaves the terminal galactose 
25 from gangliosktes, glycoproteins, and glycosaminoglycans and whose deficiency is the cause of the genetic disease 
Gm(1) gangliosidosis (Morquio disease type B). 

[0742] On of the best conserved regkvis in these enzymes contains a glutamk: acid reskJue whfch, on the basis of 
similarities with other families of glycosyl hydrolases [4], probably acts as the proton donor in the catalytte mechanism. 
This regkyi wss used as a signature pattern. 
30 [0743] Consensus pattern: G-G-P-{LIVM](2)-x{2)-Q-x-E-N-E-[FY] [The second E is the putative active site residue] 
Sequences known to betong to this class detected by the pattern ALL 

[ 1J Taron C.H., Benner J.S.. Homstra L.J., Guthrie E.P. Glycobiokjgy 5:603-610(1995). 
[ 2] Carey A.T, Holt K.. Pfcard S., WiWe R.. Tucker G.A., Bird C.R., Schuch W, Seymour G.B. Plant Physiol 108" 
35 1099-1107(1995). 

[ 3] Henrissat B.. Bairoch A. Bkxhem. J, 293:781-788(1993). 

[ 4] Henrissat B., Callebaut I., Fabrega S.. Lehn P., Mornon J.-P.. Davies G. Proc. Natl. Acad Sci USA 92* 
7090-7094(1995). 

40 [0744] 250. (Glyco^hydro_16) 

Glycosyl hydrolases family 16 signature 

[0745] It has been shown [1 J that the foltowing glycosyl hydrolases can be classified into a single family on the basis 
of sequence similarities: 

45 - Bacterial beta-1,3-1,4-gfucanases, or lichenases, (EC 3.2.1.73) mainly from Bacillus but also from Ctostridium 
thermocellum (gene licB), Fibrobacter succinogenes and Rhodothermus marinus (gene bgIA). 

- Bacillus circulans beta-1 ,3-glucanase A1 (EC 3.2.1 .39) (gene gIcA). 

- Lamarinase (EC 3.2.1.6) from Ck>strkiium thermocellum (gene lami). 

- Streptomyces ooelkx>k>r agarase (EC 3.2.1 .81 ) (gene dagA). 

so - Atteromonas carrageenovora kappa-carrageenase (EC 3.2. 1 .83) (gene cgkA). 

[0746] Two closely clustered conserved glutamates have been shown [2] to be involved in the catalytk; activity of 

Bacillus Ik^heniformis Bchenase. The regbn that contains these residues as a signature pattern was used. 

[0747] Consensus pattern E-IUVl-D-[LIV|-x(0.1 )-E-x(2)^GQ]-(KRNF]-x-[PSTA] [The two E's are active site reskJues] 

ss 

1 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Juncosa M., Pons J.. Dot T. Querol E., Planas A. J. Btol. Chem. 269:14530-14535(1 994). 
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[0748] 251. (Glyco_hydro_17) 
Glycosyl hydrolases family 17 signature 
(aka glycosyl_hydro4) 

[0749] It has been shown I1.2J that the following glycosyl hydrolases can be classified into a single family on the 
s basis of sequence similarities: 



- Glucan endo-1 ,34>eta-glucosidases (EC 3.2. 1 .39) (endo-(1 ->3)-beta-glucanase) from varbus plants. This enzyme 
may be involved in the defense of plants against pathogens through its ability to degrade fungal cell wall polysac- 
charides. 

10 - Glucan 1 ,3-beta-glucosidase (EC 3.2.1.58) (exo-(1->3)-beta-glucanase) from yeast (gene BGL2). This enzyme 
may play a role in cell expansion during growth, in cell-cell fusion during mating, and in spore release during 
sporulation. 

- Lichenases (EC 3.2. 1 .73) (endo-(1 ->3, 1 ->4)-beta-glucanase) from various plants. 

IS [0750] The best consented region in the sequence of these enzymes is located in their central section. This region 
contains a consented tryptophan residue which could be involved in the interaction with the glucan substrates [2] and 
it also contains a consen/ed glutamate which has been shown [3] to act as the nucleophile in the catalytic mechanism. 
This region was used as a signature pattern. 

[0751] Consensus pattern [U VMl-x-[LI VMFYWA](3)4STAGl-E-[STA]-G-W-P-{STN]-x-[SAGQ] (E is an active site res- 
20 idue] Sequences known to bekxig to this class detected by the pattern ALL. 



[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Ori N., Sessa G.. Lolan T. Himmelhoch S., Ruhr R. EMBO J. 9:3429-3436(1 990). 

[ 3] Varghese J.N., Garrett TP. J., Colman P.M., Chen L. Hoj P. J., Fincher G.B. Proc. Natl. Acad. Sci. U.S.A. 91: 
2S 2785-2789(1994). 



[0752] 252. (Glyco_hydro_3) 
Glycosyl hydrolases family 3 active site 

[0753] It has been shown [1 ,2] that the folk)wing glycosyl hydrolases can be, on the basis of sequence similarities, 
30 classified into a single family: 



- Beta glucosidases (EC 3.2.1 .21 ) from the fungi Aspergillus wentii (A-3), Hansenula anomala, Kluyveromyces f ra- 
gilis, Saccharomycopsisfibuligera. (BGL1 and BGL2), Schizophyllum commune and Trichodemria reesei (BGL1). 

- Beta glucosklases from the bacteria Agrobacterium tumefaciens (Cbgl ), Butyrivibrio fibrisolvens (bgIA), Clostrid- 
35 ium thermocellum (bglB), Escherichia coli (bglX), Erwinia chrysanthemi (bgxA) and Ruminococcus atbus. 

- Alteromonas strain 07 beta-hexosaminidase A (EC 3.2. 1 .52). 

- Bacillus subtilis hypothetical protein yzbA. 

- Escherichk:a coli hypothetk:al protein ycfO and HI0959, the corresponding Haemophilus influenzae protein. 

40 [0754] One of the consen/ed regions in these enzymes is centered on a conserved aspartic acid residue which has 
been shown [3], in Aspergillus wentii beta- glucosidase A3, to be implicated in the catalytic mechanism. This regbn 
was used as a signature pattern. 

[0755] Consensus pattemlLI VM](2)^KR]-x-[EQKJ-x(4)-G-[LIVMFT]-[LI\^-(LIVMn^ [ST|-D-x{2)-[SGADNI] [D is the 
active site reskiue] Sequences known to belong to thte class detected by the pattemALL 

45 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Castle LA.. Smith K.D., Morris R.O. J. Bacterbl. 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Btophys. Acta 626:459-465(1 980). 

so [0756] 253. (Glyco_hydro_28) 

Polygalacturonase active site (aka PG) 

[07571 Polygalacturonase (EC 3.2.1 .15) (PG) (pectinase) [1 ,2] catalyzes the random hydrolysis of 1 ,4-alpha-D-ga- 
lactosiduronic linkages in pectate and other galacturonans. In fniit, polygalacturonase plays an important role in cell 
wall metabolism during ripening. In plant bacterial pathogens such as Enwinia carotovora or Pseudomonas 
ss solanacearum and fungal pathogens such as Aspergillus niger. polygalacturonase is involved in maceration and soft- 
rotting of plant tissue. 

[0758] Exo-poly-alpha-D-galacturonoskJase (EC 3.2. 1 .82) (exoPG) [3] hydrolyzes peptic acid from the non-reducing 
end, releasing digalacturonate. 
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[07591 Prokaryotlc, eukaryotic PG and exoPG share a few regions of sequence similarity. The best conserved of 
these regions was selected. It Is centered on a conserved histidrne most probably involved in the catalytic mechanism 
14]. 

[0760] Consensus pattem[GSDENKRH]-x(2HVMFC]-x(2)-[GS]-H-G-[LIVMAG]-x(1 .2HLI VM]-G-S [H is the putative 
s active site residue] Sequences known to belong to this class detected by the pattemALL 

[0761] Note: these proteins belong to family 28 in the classification of glycosyl hydrolases [5]. 

[ 1] Ruttowski E., LabiUke R., Khanh N.a. Loeffler R, Goltschalk M., Jany K.-D. Biochlm. Biophys. Acta 1087: 
104-106(1990). 

10 1 2] Huang J., Schell MA J. Bacteriol. 172:3879-3887(1990). 

[ 3] He S.Y.. Collmer A. J. Bacterbl. 172:4988-4995(1990). 
1 4] Bussink H.J.D., Buxton RR, Visser J. Curr. Genet. 19:467-474(1991). 
[ 5] Henrissat B. Bkxrhem. J. 280:309-316(1991). 

IS [0762] 254. (Glyco_hydro_32) 

Glycosyl hydrolases family 32 active site 

[0763] It has been shown [1 .2J that the folkjwing glycosyl hydrolases can be classified into a single family on the 
basis of sequence similarities: 

20 - Inulinase (EC 3.2. 1 .7) (or inulase) from the fungi Kluyveromyces marxianus. 

Beta-f mctof uranosidase (EC 3.2. 1 . 26), commonly known as invertase in fungi and plants and as sucrase In bacteria 
(gene sacA or scrB). 

- Raffinose Invertase (EC 3.2.1 .26) (gene rafD) from Escherichia coil plasmki pRSD2. 

- Levanase (EC 3.2.1 .65) (gene sacC) from Bacillus subtilis. 

2S 

[0764] One of the conserved regions in these enzymes is located in the N-terminal section and contains an aspartk; 
ackJ residue which has been shown [3], in yeast invertase to be important for the catalytic mechanism. This region was 
used as a signature pattern. 

[0765] Consensus pattern H-x(2)-P-x(4)-[LI VMJ-N-D-P-N-G [D is the active site resWue] Sequences known to betong 
30 to this class detected by the pattemALL. 

( 1] Henrissat B. Biochem. J. 280:309-316(1991). 

( 2] Gunasekaran R, Karunakaran T, Cami B.. Mukundan A.G.. Prezk)si L.. Baratti J. J. Bacterbl. 172:6727-6735 
(1990). 

3S 1 3] Reddy V.A.. Maley R J. Bbl. Chem. 265:10817-10120(1990). 

[0766] 255. (Glyco_hydro_1) 
Glycosyl hydrolases family 1 signatures 

[0767] It has been shown [1 to 4] that the foltowing glycosyl hydrolases can be, on the basis of sequence similarities, 
40 classified into a single family: 

• Beta-glucosklases (EC 3.2.1 .21 ) from various bacteria such as Agrobacterium strain ATCC 21400, Bacillus poly- 
myxa. and Caktocellum saccharolytbum. 

- Two plants (ck>ver) beta-glucosidases (EC 3.2. 1 .21 ). 

45 - Two different beta-galactosidases (EC 3.2. 1 .23) from the archaebacteria Sulf otobus solfatarcus (genes bgaS and 
lacS). 

- 6-phospho-beta-galactosidases (EC 3.2.1 .85) from various bacteria such as Lactobacillus casei. Lactococcus lac- 
tis, and Staphylococcus aureus. 

■ 6-phospho-beta-glucosklases (EC 3.2, 1 .86) from Escherichia coli (genes bgIB and ascB) and from Enmnia chry- 
50 santhemi (gene arbB). 

- Plants myrosinases (EC 3.2.3.1) (sinigrinase) (thk)glucosidase). 

- Manvnalian lactase-phlorizin hydrolase (LPH) (EC 3.2. 1 . 1 08 / EC 3.2. 1 .62). LPH, an integral membrane glycopro- 
tein, is the enzyme that splits lactose in the small intestine. LPH is a large protein of about 1900 residues which 
contains four tandem repeats of a domain of about 450 residues which is evolutbnary related to the above glycosyl 

ss hydrolases. 

[0768] One of the conserved regions in these enzymes is centered on a conserved glutamic ackJ reskiue which has 
been shown [5], in the beta-glucoskiase from Agrobacterium, to be directly involved in glycoskJic bond cleavage by 
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acting as a nucteophiie. This region was used as a signature pattern. As a second signature pattern we selected a 
conserved region, found in the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue. 
[0769] Consensus pattern[LIVMFSTC]-[UVFYSJ^UV]-[LIVMSTl-E-N-G-[LIVMFAR)-ICSAGN] [E is the active site 
residue] Sequences known to belong to this class detected by the pattemALL 

[0770] Note: this pattern will pick up the last two domains of LPH; the first two domains, which are removed from the 
LPH precursor by proleolytc processing, have lost the active site glutamate and may therefore be Inactive [4]. 
[0771] Consensus pattemF-x-[FYWMJ-IGSTA]-x-{GSTA]-x-IGSTA](2)-[FYNH]-[NQ]-x-E-x-[GSTA] Sequences 
known to bekvig to this class detected by the pattern ALL 
[0772] Note: this pattern will pick up the last three domains of LPH. 



[ 1] Henrissat B. Bk«hem. J. 280:309-316(1991). 
1 2] Henrissat B. Protein Seq. Data Anal. 4:61 -62(1 991 ). 
[ 3] Gonzalez-Candelas L. Ramon D.. Polaina J. Gene 95:31-38(1990). 
[ 4] El Hassouni M., Henrissat B., Chlppaux M., Barras R J. BacterbL 174:765-777(1992). 
»5 [ 5] Withers S.G., Vferren R.A.J.. Street I.P.. Rupitz K.. Kempton J.B., Aebersoki R. J. Am. Chem. Soc. 112: 

5887-5889(1990). 

[0773] 256. Glyco_hydro_20 
Glycosyl hydrolase family 20 
20 Previous Ram IDs: glycosyLhydrll; 
Number of members: 33 
[0774] 257. (Glyco_hydro_9) 
Glycosyl hydrolases family 9 active sites signatures 
(aka GlycosyLhydr12) 

^ [0775] The microbial degradatk)n of cellubse and xylans requires several types of enzymes such as endoglucanases 
(EC 32.1.4), celtobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 3.2.1.8) [1,2]. Fungi and bacteria 
produces a spectrum of celluk}lytic enzymes (cellulases) and xylanases whk:h, on the basis of sequence similarities, 
can be classified Into families. One of these families is known as the cellulase family E [3] or as the glycosyl hydrolases 
family 9 [4,E1]. The enzymes whfch are currently known to belong to this family are listed below. 

30 

Butyrivibrb fibrisolvens celkxiextrinase 1 (cedl ). 

- Cellubmonas fimi endoglucanases B (cenB) and C (cenC). 

- Clostridium ceilutolyticum endoglucanase G (celCCG). 

- Clostridium celluk>vorans endoglucanase C (engC). 

3S - Ctostridium stercoararium endoglucanase Z (avicelase I) (celZ). 

- Clostridium thermocellum endoglucanases D (celD), F (celF) and I (cell ). 
Fibrobacter succtnogenes erxioglucanase A (endA). 
Pseudomonas fluorescens endoglucanase A (celA). 

- Streptomyces reticuli endoglucanase 1 (cell). 

40 - Tlienmomonospora fusca endoglucanase E-4 (celD). 

- Dictyostellum discoideum spore germinatwn specific endoglucanase 270-6. This slime mold enzyme may digest 
the spore cell wall during germination, to release the enclosed amoeba. 

- Endoglucanases from plants such as Avocado or French bean. In plants this enzyme may be Involved the fruit 
^ ripening process. 

[0776] Two of the most consen/ed regions in these enzymes are centered on conserved rescues which have been 
shown [5,6]. in the endoglucanase D from Cellutomonas thermocellum, to be important for the catalytk: activity. The 
first regnn contains an active site htstkline and the second regkm contains two catalytically important residues: an 
so aspartate and a glutamate. Both regkjns were used as signature patterns. 

[0777] Consensus pattern ISTVl-x-[UVMFY]-lSTV]-x(2)-G-x-lNKRJ-x(4)-[PLI VM]-H-x-R [H is an active site reskJue) 
Sequences known to bek)ng to this class detected by the pattern ALL, except for Cellutomonas fimi cenC and Strep- 
tomyces retk:uli ceil. 

[0778] Consensus pattern [FYW]-x-D-x(4)-[FYWJ-x(3)-E-x4STA]-x(3)-N-[STA] [D and E are active site reskJues] Se- 
ss quences known to betong to this class detected by the pattern ALL, except for Fibrobacter succinogenes endA whose 
sequence seems to be incorrect. 

[ 1] Beguin P Annu. Rev. Mcrobtol. 44:219-248(1990). 
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[ 2] Gilkes N.R., Henrissat B.. Kilbum D.G„ m\er R.C. Jr., Warren R.AJ. Microbiol. Rev. 55:303-315(1991). 
1 3] Henrissat B.. Ctaeyssens M.. Tomme R. Lemesle L, Momon J,-P. Gene 81:83-95(1989). 
[ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

[5] Tomme P., Chauvaux S., Beguin R, Millet J., Aubert J.-R, Claeyssens M. J. Biol. Chem. 266:10313^10318 
(1991). 

( 6] Tomme R, van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 
[0779] 258. Matrix protein (MA), pi 5 (GAG^ma) 

[0780] The matrix protein, pi 5, is encoded by the gag gene. MA is involved in pathogenicity [1]. 
[0781] [1] : Pozsgay JM, Beilharz MW. Wines BD, Hess AD. Pitha PM. J Virol 1993;67:5989-5999. 
[0782] 259. Gag polyprotein, inner coat protein p12 (GAG_P12) 

[0783] The retroviral pi 2 Is a virion structural protein. p1 2 is proline rich. The function canied out by pi 2 in assembly 

and replication is unknown. p12C is associated with pathogenicity of the virus 

[1] Pozsgay JM, Beilharz MW, Wines BD, Hess AD. Pitha PM, J Virol 1993;67:5989-5999. 

[0784] 260. Glutamine synthetase signatures (GLN-SYNT) 

Glutamine synthetase (EC 6.3.1.2) (GS) [1] plays an essential role in the metabolism of nitrogen by catalyzing the 
condensation d glutamate and ammonia to form glutamine. There seem to be three different classes of GS [2,3,4]: - 
Class I enzymes (GSI) are specific to prokaryotes, and are oligomers of 12 identical subunits. The activity of GSI4ype 
enzyme is controlled by the adeny1atk>n of a tyrosine residue. The adenylated enzyme is Inactive. - Class il enzymes 
(GSII) are found in eukaryotes and in bacteria belonging to the Rhizobiaceae. Frankiaceae, and Streptomycetaceae 
families (these bacteria have also a class-l GS). GSII are octamer of identical subunits. Plants have two or more 
isozymes of GSII. one of the isozymes is transkxated into the chbroplast. - Class III enzymes (GSIII) has, currently 
only been found in Bacterokies fragills aruJ in butyrivibrio fibrisolvens. It is a hexamer of identical chains. It is much 
larger (about 700 amino acids) than the GSI (450 to 470 amino acids) or GSII (350 to 420 amino acids) enzymes. 
While the three classes of GS*s are clearty structurally related, the sequence similarities are not so extensive. As 
signature patterns three consen/ed regrons were selected. The first pattem is based on a consen/ed tetrapeptkte in 
the N-termtnal sectbn of the enzyme, the second one is based on a glycine-rich regk)n whbh is thought to be involved 
in ATP-binding. The third pattem is specific to class I glutamine synthetases and Includes the tyrosine reskJue whk:h 
is reversbly adenylated. 

[0785] Consensus pattem: [FYWL]-D-G-S-S-x(6,8)-[DENQSTAK]-[SA]-[DE]-x(2)-[LIVMFY]- 

Consensus pattem: K-P-[LIVMFYAl-x(3,5)-[NPAT|-G-[GSTAN]-G-x-H-x{3)-S- 

Consensus pattem: K4LIVM]-x(5)-[U VMA]-D-[RKHDN]-[LI]-Y [Y is the site of adenylation]- 

[ 1] Eisenberg D., Almassy R.J., Janson C.A., Chapman M.S., Suh S.W., Cascio D., Smith WW. CokJ Spring 
Harbor Symp. Quant. Biol. 52:483-490(1987). 

[ 2] Kumada Y. Benson D.R., Hillennann D., Hosted TJ.. Rochefort D.A.. Thompson C.J., Wohlleben W, Tateno 

Y Proc. NaXl Acad. Sci. U.S.A. 90:3009-3013(1993). 

[ 3] Shatters RG., Kahn M.L J. Mol. Evol. 29:422-428(1989). 

[ 4] Brown J R., Masuchi Y, Robb RT. Doolrttle W.R J. Mol. Evd. 38:566-576(1994). 

[0786] 261 . Gk^ins profile (gk>bin1 ) 

Gtoblns are heme-containing proteins involved in binding and/or transporting oxygen (1 J. They belong to a very large 
and well studied family whk:h is wktely distributed in many organisms. The major groups of gk>bins are: - HenrK)globins 
(Hb) from vertebrates. Hb is the protein responsble for transporting oxygen from the lungs to other tissues. It is a 
tetramer of two alpha and two beta chains. Most vertebrate species also express specify embryonic or fetal forms of 
hemogtobin where the alpha or the beta chains are replaced by a chain with higher oxygen affinity, as for the gamma, 
delta, epsrion and zeta chains in mammals, for example. - Myoglobins (Mg) from vertebrates. Mg is a monomark; 
protein responsible for oxygen storage in muscles. - Invertebrate globins (2]. A wkJe variety of globins are found in 
invertebrates. Molluscs generally have one or two muscle gtobins which are either monomerc or dimeric. Insects, such 
as the mklge Chironomus thummi, have a large set of extracellular gtobins. Nematodes and annelids have a variety 
of intracellular and extracellular gtobins; some of them are multi- domain polypeptides (from two up to nine<tomain 
globins) and some produce large, disulfkte-bonded aggregates. - LeghenrKjgkJbins (Lg) from the root nodules of legu- 
minous plants. Lg provkles oxygen for bacteroids. - Flavohemoproteins from bacteria (Escherichia coli hmpA) and 
fungi [3]. These proteins consist of two distinct domains: an N-tenminal globin domain and a C-terminal FAD-containing 
reductase donrtain. In bacteria such as Vitreoscilla, the enzyme-associated gtobin is a single domain protein. All these 
globins seem to have evolved from a common ancestor. The profile developed to detect members of the globin family 
is based on a structural alignment of selected globin sequences 

[ 1] Concise EncyctopediaBkx:hemistry, Second Editton. Walter de Gruyter, Berlin New-York(1988).[ 2] Goodman M., 
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Pedwaydon J., Czelusniak J., Suzuki T, Gotoh T, Moens L, Shishikura R. Walz D.. Vinogradov S. J. Mol. Evoi, 27: 
236-249(1988). 

[0787] Plant hemoglobins signature (globin2) 

Leghemogk>bins [1 ] are hemoproteins present in the root nodules of legumlnousplants. Leghenrtoglobins are structurally 
s and functionally related to henxjgtobin and myogtobin. By providing oxygen to the bacterokls, they are essential for 
symbk>tic nitrogen fixatkxi. Structurally related hemogtobins from the nodules of non-leguminous plants [2.3], and from 
the roots of non-nodulating plants(4] have been recently sequenced. A signature pattern was devetoped that picks up 
the sequence of plants hemoglobins, exclusively. 
[0788] Consensus pattem: ISN]'P-x-L-x(2)-H'A-x(3)-F- 

10 

1 1] Powell a. Gannon F. Bk>Essays 9:117-121(1986). 

[ 2] Kortt A.A.. Trinick M.J.. Appleby C.A. Eur. J. Bkjchem. 175:141-149(1988). 

[ 3] Kortt A A., Inglis A.S.. Fleming A.I., Appleby C.A. FEBS Lett. 231:341-346(1988). 

[ 4] Bogusz D.. Appleby C.A, Landsmann J., Dennis E.S., Trinick M.J., Peacock W.J. Nature 331:178-180(1988). 
[0789] 262. Fructose-bisphosphate aldolase class-l active site (glycotytic.enz) 

[0790] Fructose-blsphosphate aldolase (1,2] is a glycolytk: enzyme that catalyzes the reversible aktol cleavage or 
condensation <rf f ructose-1 ,6-bisphosphate into dihydroxyacetone-phosphate and glyceraldehyde 3-phosphateThere 
are two classes of fructose-bisphosphate aldolases with different catalytic mechanisms. Class-l aldolases [3], mainly 
20 found in higher eukaiyotes, are homotetramerk: enzymes which form a Schiff-base intermediate between the C-2 
carbonyl group of the substrate (dihydroxyacetone phosphate)and the epsilon-amino group of a lysine reskiue. In 
vertebrates, three forms of this enzyme are found: aktolase A in muscle, aktolase B In liver and aldolase C in brain. 
The sequence around the lysine involved in the Schiff-base is highly conserved and can be used as a signature for 
this class of enzyme. 

25 [0791] Consensus pattem: [UVMI-x-[UVMFYWJ-E-G-x-[LS]-L-K-P-[SN] [K is involved in Schiff-base formatbn]- 

( 1] Perham R.N. Bkx:hem. Soc. Trans. 18:185-187(1990). 

[ 2) Marsh J. J.. Lebherz H.G. Trends Btochem. Sci. 17:110-113(1992). 

[ 3] Freemont P.S., Dunbar B.. Fothergill-Gilmore LA Biochem. J. 249:779-788(1988). 

30 

[0792] 263. Gtycosyl hydrolases family 11 active sites signatures 

The microbial degradation of celluk>se and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4). cellobk)hydrolases (EC 3.2.1.91 ) (exoglucanases), or xylanases (EC 3.2.1.8) [1 .2]. Fungi and bacteria pro- 
duces a spectmm of cellulolylic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, can 

3S be classified into families. One of these families is known as the cellulase family G [3] or as the glycosyl hydrolases 
family 1 1 [4, El]. The enzymes which are cunently known to belong to this family are listed betow. - Aspergillus awamori 
xylanase C (xynC). - Bacillus circulans, pumilus, slearothermophilus and subtilis xylanase (xynA). - ClostrkJium ace- 
tobutylicum xylanase (xynB). - Ck)6lridium stercorarium xylanase A (xynA). - Fibrobacter succinogenes xylanase C 
(xynC) whkrfi consist of two catalytk: domains that both bebng to family 10. - Neocallimastix patriciarum xylanase A 

40 (xynA). - Ruminococcus flavefaciens bifunctwnal xylanase XYLA (xynA). This protein consists of three domains: a N- 
terminal xylanase catalytic donnain that belongs to family 11 of gtycosyl hydrolases; a central domain composed of 
short repeats of Gin, Asn an Trp. and a C-tenmtnal xylanase catalytic domain that belongs to family 10 of glycosyl 
hydrolases. - Schizophyllum commune xylanase A - Streptomyces livktens xylanases B (xlnB) and C (xlnC). - Tri- 
choderma reesei xylanases I and 11. Two of the consent regkxis in these enzymes are centered on glutamic acid- 

4S residues whk:h have both been shown [5], in Bacillus pumilis xylanase, to be necessary for catalytb activity. Both 
regions were used as signature patterns. 

[0793] Consensus pattem: [PSAHLQ]-x-E-Y-Y-[LIVM](2)-[DE]-x-[FYWHN] [E is an active site residue]- 
Consensus pattem: (UVMF]-x(2)-E-IAGHYWG]-[QRFGSHSGHSTAN]-G-x-[SAF] [E is an active site reskJue]- 

so 1 1] Beguin P Annu. Rev. Mcrobk)!. 44:219-248(1990). 

[ 2] Gilkes N.R.. Henrissat B., Kilbum D.G.. Miller R.C. Jr., Warren R.AJ. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B., Claeyssens M.. Tomme P., Lemesle L, Momon J.-P Gene 81:83-95(1989). 
[ 4] Henrissat B. Bkx:hem. J. 280:309-316(1991). 

[ 5) Ko E.P, Akatsuka H.. Moriyama H., Shinmyo A, Hata Y, Katsube Y, Urabe 1., Okada H. Bkx:hem, J. 288: 
S5 117-121(1992). 

[0794] 264. Glycosyl hydrolase family 14 
[0795] This family are beta amylases. 
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[0796] 265. Glycosyl hydrolases family 1 signatures 

It has been shown [1 to 4] that the following glycosyl hydrolases can be, on the basis of sequence simllarrties, classified 
into a single family: - Beta-glucosidases (EC 3.2.1.21 ) from varbus bacteria such as Agrobacterium strain ATCC 21 400, 
Bacillus polymyxa, and Caldocellum saccharolyticum. - Two plants (clover) beta-glucosidases (EC 3.2.1.21) . - Two 

5 different beta-galactosidases (EC 3.2.1.23 ) from the archaebacteria Sulfolobus solfataricus (genes bgaS and lacS). - 
6-phospho-beta-galactosidases (EC 3.2.1.B5) from various bacteria such as Lactobacillus casei, Lactococcus lactis, 
and Staphylococcus aureus. - 6-phospho-beta-glucosidases (EC 3.2.1.88) from Escherichia coli (genes bgIB and ascB) 
and from Enwiniachrysanthemi (genearbB). - Plants myroslnases (EC 3.2.3.1 ) (sintgrinase) (thioglucosidase). - Mam- 
malian lactase-phtarizin hydrolase (LPH) (EC 3.2. 1 .108 / EC 3.2.1 .62 ). LPH, an integral membrane glycoprotein, is 

10 the enzyme that splits lactose in the small intestine. LPH Is a large protein of about 1 900 residues which contains four 
tandem repeats of a domain of about 450 residues which is evolutionary related to the above glycosyl hydrolases. One 
of the consented regions In these enzymes is centered on a consented glutamic acid residue which has been shown 
[5], in the beta-glucosidase from Agrobacterium, to be directly involved in glycosldic bond cleavage by acting as a 
nucleophile. This region was used as a signature pattem. As a second signature pattern a conserved region was 

IS selected, found in the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue. 

[07971 Consensus pattem: [LIVMFSTCHUVFYSHLIVHLIVMST>E-N-G-{LIVMFAR]4CSAGN] [E is the active site 
residue] 

Note: this pattem will pick up the last two domains of LPH; the first two donrains, which are removed from the LPH 
precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [4]. 
20 [0798] Consensus pattern: F-x-IFYWMHGSTAhx4GSTA]-x-[GSTA](2HFYNH]-(NQhx-E-x4GSTA]- 

[ 1] Henrissat 8. Biochem. J. 280:309-316(1991). 
[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 
[ 3] Gonzalez-Candelas L, Ranrwn D., Polaina J. Gene 95:31-38(1990). 
2S [ 4] El Hassouni M., Henrissat 8., Chippaux M., Barras F. J. Bacterid. 174:765-777(1992). 

[5] Withers S.G.. Warren R.A.J., Street LP. Rupitz K., Kempton J.B., Aebersold R. J. Am. Chem. Soc. 112 
5887-5889(1990). 

[0799] 266. Glycosyl hydrolases family 2 signatures 

30 It has been shown [1 ,2, E1J that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: - Beta-galactosidases (EC 3.2.1.23) from bacteria such as Escherichia coli (genes lacZ and ebgA), 
Clostridium acetobutylicum, Clostridium thermosulfurogenes, Klebsiella pneumoniae. Lactobacillus delbrueckii. or 
Streptococcus thermophilus and from the fungi Kluyveromyces lactis. - Beta-glucuronidase (EC 3.2. 1.31) from Es- 
cherichia coli (gene uidA) and from mammals. One of the consented regions in these enzymes is centered on a con- 

35 served glutamic acid residue which has been shown [3], in Escherichia coli lacZ, to be the general acidise catalyst 
in the active site of the enzyme. This region was used as a signature pattem. As a second signature pattem a highly 
consen/ed region was selected located some sixty residues upstream from the active site glutamate. 
[0800] Consensus pattem: N-x-ILIVMFYWD]-R-(STACN](2)-H-Y-P-x(4)-ILIVMFYWS](2)-x(3)- [DNJ-x(2)-G-[UVM- 
FYW|(4). 

40 Consensus pattem: [DENQLF]-[KRVWPN-IHRYHSTAPV1-[SAC]-(LI VMFS](3)-W-[GS1- x(2,3)-N-E [E is the active site 
residue}- 

[ 1] Henrissat B. Bkx:hem. J. 280:309-316(1991). 

[ 2] Schroeder C.J., Robert C, Lenzen 6., McKay LL. Mercenier A. J. Gen. Microbfol. 137:369-380(1991). 
45 [ 3] Gebler J.C.. AebersoW R., Withers S.G. J. Bbl. Chem. 267:11126-11130(1992). 

[0801] 267. Glycosyl hydrolases family 3 active site 

It has been shown [1.2] that the following glycosyl hydrolases can be. on the basis of sequence similarities, classified 
into a single family: 

50 

- Beta glucosidases (EC 3.2.1 .21) from the fungi Aspergillus wentii (A-3), Hansenula anomala, Kluyveromyces fra- 
gilis, Saccharomycopsis fibuligcra. (BGL1 and BGL2), Schizophyllum commune and Trichoderma reesei (BGL1). 

- Beta glucosklases from the bacteria Agrobacterium tumefaciens (Cbgl ), Butyrivibrio fibrisolvens (bgIA), ClostrkJ- 
ium thenmocellum (bgIB). Escherichia coli (bglX). Enwinia chrysanthemi (bgxA) and Rumlnococcus albus. - Al- 

S5 teromonas strain 0-7 beta-hexosaminidase A (EC 3.2. 1 .52). 

- Bacillus subtilis hypothetical protein yzbA. 

- Escherichk» coli hypoth^lcal protein ycfO and HI0959. the corresponding Haemophilus influenzae protein. 
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One of the conserved regions in these enzymes is centered on a conserved aspartic acid residue which has been 
shown [3], in Aspergillus wentii beta-glucostdase A3, to be implicated in the catalytic mechanism. This region was used 
as a signature pattern, 

[0802] Consensus pattern: [LI VM](2HKRhx4EQKhx(4)-G-(LI VMFT]4LI\^-[LI\^Fn-[ST]-D'X [D is the 

s active site residue] 

[ 1] Henrissat B. Biochem. J. ^:309-3l6(l991). 

1 2] Castle LA., Smith K.D.. Morris R.O. J. Bacterfol. 174:1478-1486(1992). 
( 3] Bause E., Legler G. Biochim. Biophys. Acta 826:459-465(1980). 

10 

[0803] 268. Glycosyl hydrolases family 8 signature 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3-2-1-4) , cellobiohydrolases (EC 32. 1 .91 U exoqiucanases). or xylanases (EC 3.2.1.8 ) [1 .2]. Fungi and bacteria pro- 
duces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, can 

IS be classified into families. One of these families is kr^own as the cellulase family D [3] or as the glycosyl hydrolases 
family 8 (4^E1]. The enzymes which are currently known to belong to this family are listed bek>w. - Acetobacter xylinum 
endonuclease cmcAX. - Bacillus strain KSM-330 ackiic endonuclease K (Endo-K). - Cellubmonas josui endoglucanase 
2 (celB). - Cellukxnonas uda endoglucanase. - CtostrkJIum cellulolyticum endoglucanases C (ceteCC). - Clostridium 
thermocellum endoglucanases A (celA). - Erwinia chrysanthemi minor endoglucanase y (celY). - Bacillus circulans 

20 beta-glucanase (EC 3.2.1.73) . - Escherichia coli hypothetical protein yhjM. The most conserved region In these en- 
zymes is a stretch of about 20 residues ttiat contains two consented aspartate. The first asparatate is thought [5] to 
act as the nucleophile In the catalytk; mechanism. This region was used as a signature pattern. 
Consensus pattern: A-IST]-D-[AGJ-D-x(2)4IM]-A-x4SA]-[LIVMh[UVI^]-x-A- x(3)-[FW| [The first D is an active site 
residue)- 

2S 

1 1] Beguin R Annu. Rev. Microbbl. 44:219-248(1990). 

[ 2] Gllkes N.R., Henrissat B., KHbum D.G.. Miller R.C. Jr.. WSarren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B.. Claeyssens M.. Tomme P., Lemesle L. Momon J.-R Gene 81:83-95(1989). 
[ 4] Henrissat B. Bkx:hem. J. ^:309-316(1991). 
30 1 5] Alzari P.M., Souchon H., Dominguez R Structure 4:265-275(1 996). 

[0804] 269. Glycosyl hydrolases family 9 active sites signatures 

The microbial degradation of celluk)se and xylans requires several types of enzymes such as endoglucanases (EC 
32.1.4 ). celtobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 32. 1.8 ) [1 ,2]. Fungi and bacteria produce 

35 a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, can be 
classified into families. One off these families Is known as the cellulase family E [3] or as the glycosyl hydrolases family 
9 [4,E1]. The enzymes which are currently known to betong to this family are listed below - Butyrivibrio fibrisolvens 
celkxiextrinase 1 (cedl). - Ceilulomonas fimi endoglucanases B (cenB) and C (cenC). - CtostrkJium cellutolyllcum 
endoglucanase G (celCCG). - Cbstridtum cellulovorans endoglucanase C (engC). - ClostrkJIum stercoararium endog- 

40 lucanase Z (avk:elase I) (celZ). - Ck>stridium thermocellum endoglucanases D (celD), F (celF) and I (cell). - FIbrobacter 
succlnogenes endoglucanase A (erxJA). - Pseudomonas fluorescens endoglucanase A (celA). - Streptomyces retk:uli 
endoglucanase 1 (cell). - Thermomonospora f usca endoglucanase E-4 (celD). - Dk:tyostelium discokieum spore ger- 
minatkan specific endoglucanase 270-6. This slime mokJ enzyme nnay digest the spore celt wall during gemninatwn, to 
release the enclosed amoeba - Endoglucanases from plants such as Avocado or French bean. In plants this enzyme 

45 may be Involved the fruit rtpenmg process. Two of the most conserved regions in these enzymes are centered on 
consented rescues whrch have been shown [5,61. in the endoglucanase D from Cellutomonas thermocellum, to be 
Important for the catalytk: activity. The first regk)n contains an active site histidine and the second regbn contains two 
catalytk:ally important reskiues: an aspartate and a glutamate. Both regions were used as signature patterns. 
[0805] Consensus pattern: [STVI-x-[U VMFYHSTVhx(2)-G-x-[NKRl-x(4)-[PLI VM]-H-x-R [H is an active site reskiueh 

SO Consensus pattern: [FYWl-x-D-x(4)-tFYWhx(3)-E-x-[STAhx(3)-N-(STA] [D and E are active site residues]- 

( 1] Beguin R Annu. Rev. McrobKrf. 44:21 &-248( 1990). 

[ 2) Gilkes N.R., Henrissat B.. Kilbum D.G., Miller R.C. Jr.. Warren R.A.J. Microbiol. Rev 55:303-315(1991). 
[ 3) Henrissat B.. Claeyssens M., Tomme R. Lemesle L. Momon J.-P, Gene 81:83-95(1989). 
ss [ 4] Henrissat B. Bkx:hem. J. 280:309-31 6(1 991 ). 

[ 5] Tomme R, Chauvaux S., Beguin R. Millet J., Aubert J.-R, Claeyssens M. J. Biol. Chem. 266:10313-10318 
(1991). 

[ 6] Tomme P., van Beeumen J., Claeyssens M. Bkx:hem. J. 265:319-324(1992). 
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[0806] 270. Glyceraldehyde 3-phosphate dehydrogenase active site (gpdh) 

Glyceraldehyde 3-phosphate dehydrogenase (EC 1.2.1.12> (GAPDH) [1 J is a tetrameric NAD-binding enzyme common 
to both the glycolytic and gluconeogenic pathways. A cysteine in the middle of the molecule is involved in forming a 
covalent phosphoglycerol thioester intermediate. The sequence around this cysteine is totally consented in eubacterial 
s and eukaryotic GAPDHs and is also present, albeit in a variant form, in the otherwise highly divergent archaebacterial 
GAPDH I2).Escherichia coli D-erylhrose 4-phosphate dehydrogenase (E4PDH) (gene epd orgapB) is an enzyme highly 
related to GAPDH [3]. 

[0807] Consensus pattem: [ASVI-S-C4NT]-T-x(2HLIM] [C is the active site residuej- 

10 [ 1] Harris J.I.. Waters M. (In) The Enzymes (3rd edition) 13:1-50(1976). 

[ 2] Fabry S., l^g J., Niemnann T. Vingron M.. Hensel R. Eur. J. Biochem. 179:405-413(1989). 
[3] Zhao G.. Pease A. J.. Bharani N., Winkler M.E. J. Bacteriol. 177:2804-2812(1995). 

[0808] 271. Granulins signature 
IS G ranulins [1 ] are a family of cysteine-rich peptides of about 6 Kd which may have multiple biological activity. A precursor 
protein (known as acrogranin) potentially encodes seven different forms of granulin (grnA to grmG) whk:h are probably 
released by post4ranslatk)nal proteotytk: processing. A schematic representation of the stmcture of a granulin Is shown 
below: 

xxxCxxxxxCxxxxxCCxxxxxxxxCCxxxxxxCCxxxxxCCxxxxxCxxxx^ ""•••""'"'C*: conserved cysteine probably 
20 involved in a disulfide bond.**': positbn of the pattem. Granulins are evoluttonary related to a PMP-D1, a peptkje 
extracted from thepars intercerebralis of migratory kx:usts [2]. 

[0809] Consensus pattem: C-x-D-x(2)-H-C-C-P-x(4)-C [The four C's are probably involved In disulfide bonds]- 

[ 1] Bhandari V.. Palfree RG., Bateman A. Proc. Natl. Acad. Sci. U.S.A. 89:1715-1719(1992). 
25 [ 2] Nakakura N., Hietter H., van Dorsselaer A, Luu B. Eur. J. Bk»hem. 204:147-153(1992). 

[081 0] 272. (HCV RdRp) Hepatitis C viais RN A dependent RN A polymerase 

[081 1] The RNA dependent RNA polymerase is also known as non-stmctural protein NS5B. NS5B is a 65 kDa protein 

that resembles other viral RNA polymerases. HCV replkratkxi is thought to occur In membrane bound replication com- 
30 plexes. These complexes transcribe the positive strand and the resulting minus strand is used as a template for the 

synthesis of genomic RNA. There are two viral proteins involved in the reaction, NS3 and NS5B.[1 ,2] 

[0812] [1] Lohmann V, Komer F, Hertan Bartenschlager R; 

J Virol 1997;71:8416nB428. [2] Behrens SE, Tomei L, De Francesco R; 

EMBO J 1996;15:12-22. [3] Ishkto S, Fujita T. Hotta H; 
35 Biochem Bk)phys Res Commun 1 998;244: 35-40. 

[081 3] 273 (HHH) Helix-hairpin-helix motif. 

[0814] [1] Doherty AJ, Serpell LC, Ponting CP; Nuclec AckJs Res 1996;24:2488-2497. 
[081 S] 274. HIT family signature 

Recently a family of small proteins of about 12 to 16 Kd has been describecfl]. This family currently consists (rf: - 
40 Mammalian protein HINT (also known as Protein kinase C inhibitor 1 or PKCI- 1 ). HINT was incorrectly thought to be 
a specific inhibitor of PKC. It has been shown to bind zinc. - Fisskxi yeast diadenosine 5\5"-P1,P4-tetraphosphate 
asymmetrical hydrolase (Ap4Aase) (EC 3.6.1.17) 12] (gene aphi), whch cleaves A-5*-PPPP- 5'A to yIekJ AMP and ATP 
- FHIT. a human protein whose gene is altered in different tumors and which acts [3] as a diadenosine 5\5"'-P1 ,P3-tri- 
phosphate hydrolase (ApSAase) (EC 36.1.29) cleaving A-5'-PPP-5'A to yieW AMP and ADP - Yeast proteins HNT1 
45 and HNT2. - Maize zinc-binding protein ZBP14 - Escherichia coli hypothetical protein ycfR - Haemophilus influenzae 
hypothetical protein HI0961. - Helicobacter pytori hypothetk:al protein HP0404. - Methanococcus jannaschii hypothet- 
ical protein MJ0866. - Mycobacterium leprae hypothetical protein U296A. - Synechocystis strain PCC 6803 hypothetical 
protein sirl 234. - Caenorhabditis elegans hypothetwal protein F21 C3.3. - A hypothetical 1 3.2 Kd protein In hisE S'regfon 
in Azospirillum brasitense. - A hypothetcal 13.1 Kd protein in p37 5'region in Mycoplasma hyorhinls. - A hypothetkal 
so 12.4 Kd protein in psbAII 5*regkyt in Synechococcus strain PCC 7942.AII these proteins contains a region with three 
clustered histkJines. This regkxi ts responsible for the designation of this family: HIT. for 'HIstkJineTriad [1]. This region 
was originally thought to be implied in the binding of a zinc on but was later klentified (4) as part of the alpha-phosphate 
binding site of a nudeotkle-binding domain. As a signature pattem. the regbn of the histidine triad was selected. 
[08iq Consensus pattem: [NQA]-x(4)-[GAV>x-[CF]-x-[LIVM]-x-H-[LIVMFYT|-H-[LIVMFT|-H-[LIVMF](2)-[PSGA]- 

55 

[ 1] Seraphin B. DNA Seq. 3:177-179(1992). 

[ 2] Huang Y, Garrison PN., Barnes LD. Bkx:hem. J. 312:925-932(1995). 

( 3] Barnes LD., Garrison PN.. Siprashvili 2., Guranowski A., Robinson A.K., Ingram S.W.. Croce CM.. Ohta M., 
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Huebner K. Biochemistry 35:11529-11535(1996). 

[ 4] Brenner C. Garrison R. Gllmour J.. Peisach D.. Ringe D.. Petsko GA, Lowenstein J.M. Nat. Struct. Biol 4' 
231-238(1997). 

5 [0817] 275. Myc-type, 'helix-loop-helix' dimerization domain signature (HLH) 

A number of eukaryotic proteins, which probably are sequence specific DNA-binding proteins that act as transcription 
factors, share a conserved domain of 40 to 50 amino acid residues. It has been proposed [1] that this domain is formed 
of two amphipathic helices joined by a variable length linker region that cou kJ form a loop. This "helix-loop-helix* (HLH) 
domain mediates protein dimerizatbn and has been found In the proteins listed bebw I2.3 .E1.E21 . Most of these pro- 
10 teins Nave an extra basic reg»n of about 15 amino acid reskJues that is adjacent to the HLH domain and specifk:ally 
binds to DNA. They are refered as basic helix-loop-helix proteins (bHLH). and are classified in two groups: class A 
(ubk^ultous) and class B (tissue-specifte). Members of the bHLH family bind varlatfons on the core sequence 'CANNTG' 
also referred to as the E-box motif. The homo- or heterodimerization mediated by the HLH domain is Independent of, 
but necessary for DNA binding, as two basic regions are required for DNA binding activity. The HLH proteins lacking 

IS the basic domain (Emc, Id) functton as negative regulators since they form heterodimers, but fail to bind DNA. The 
hairy-related proteins (hairy, E(spl). deadpan) also repress transcriptk)n although they can bind DNA. The proteins of 
this subfamily act together with co-repressor proteins, tike groucho, through their C4erminal motif WRPW. - The myc 
family of cellular oncogenes [4], which is currently known to contain four members: c-myc [E3], N-myc, L-myc, and B- 
myc. The myc genes are thought to play a role in cellular differentiation and proliferatk)n. - Proteins involved in myo- 

20 genesis (the inductkxi of muscle cells). In mammals MyoDI (Myf-3), myogenin (Myf-4). Myf-5. and Myf-6 (Mrf4 or 
herculin), in birds CMD1 (QMF-1), in Xenopus MyoD and MF25, in Caenorhabditis elegans CeMyoD, and in Drosophila 
nautilus (nau). - Vertebrate proteins that bind specifte DNA sequences f E boxes') in various immunogtobulin chains 
enhancers: E2Aor ITF-1 (E12/pan-2and E47/pan-1), ITF-2 (tcf4), TFE3, and TFEB. - Vertebrate neurogenk: differen- 
tiation factor 1 that acts as differentiation factor during neurogenesis. - Vertebrate MAX protein, a transcription regulator 

2S that forms a sequence- specific DNA-binding protein complex with myc or mad. - Vertebrate Max Interacting Protein 
1 (MX1 1 protein) which acts as a transcriptional repressor and may antagonize myc transcriptbnal activity by competing 
for max. - Proteins of the bHLH/PAS superfamily whfch are transcriptional activators. In mammals, AH receptor nuclear 
transkxator (ARNT). single-minded homologs (SIM1 and SIM2), hypoxia-Inducible factor 1 alpha (HIF1 A), AH receptor 
(AHR), neuronal pas domain proteins (NPAS1 and NPAS2), endothelial pas domain protein 1 (EPAS1 ), mouse ARNT2, 

30 and human BMAL1. In drosophila, single-minded (SIM), AH receptor nuclear transkx;ator (ARNT), trachealess protein 
(TRH), and similar protein (SIMA). - Mammalian transcription factors HES. which repress transcription by acting on 
two types of DNA sequences, the E box and the N box. - Mammalian MAD protein (max dimerizer) whfch acts as 
transcriptkxial repressor and may antagonize myc transcriptwnal activity by competing for max. - Mammalian Upstream 
Stimulatory Factor 1 and 2 (USF1 and USF2), which bhd to a symmetrical DNA sequence that Is found in a variety of 

3S viral and cellular promoters. - Human lyU protein; which is involved, by chromosomal transkx:ation, in T- cell leukemia. 
- Human transcriptkxi factor AP-4. - Mouse helix-kxjp-helix proteins MATH-1 and MATH-2 whk;h activate E box-de- 
pendent transcription in collaboratbn with E47. - Mammalian stem cell protein (SCL) (also known as tall), a protein 
which may play an Important role in hemopoietc differentiation. SCL is involved, by chromosomal transkx:ation. In 
stem-cell leukemia. - Mamnnalian proteins Idl to Id4 [5J. Id (inhibitor of DNA binding) proteins lack a bask: DNA-binding 

40 domain but are able to form heterodimers with other HLH proteins, thereby inhibiting binding to DNA. - Drosophila 
extra-macrochaetae (emc) protein, whk:h participates in sensory organ patterning by antagonizing the neurogenic 
activity of the achaete- scute complex. Enrx; is the honx>log of mammalian Id proteins. - Human Sterol Regulatory 
Element Binding Protein 1 (SREBP-1), a transcriptwnal activator that binds to the sterol regulatory element 1 (SRE- 
1) found in the flanking regton of the LDLR gene and in other genes. - Drosophila achaete-scute (AS-C) complex 

4S proteins T3 (rsc), T4 (scute), T5 (achaete) and T8 (asense). The AS-C proteins are involved in the determinatbn of 
the neuronal precursors in the peripheral nenraus system and the central nervous system. - Mammalian homologs of 
achaete-scute proteins, the MASH-1 and MASH-2 proteins. - Drosophila atonal protein (ato) which is involved in neu- 
rogenesis. - Drosophila daughterless (da) protein, whfch is essential for neurogenesis and sex-determinatkxi. - Dro- 
sophila deadpan (dpn), a hairy-like protein involved in the functkjnal differentiatkxi of neurons. - Drosophila delilah 

so (dei) protein, whwh is plays an important role in the differentiatkxi of epklermal cells Into muscle. - Drosophila hairy 
(h) protein, a transcriptkxial repressor whkdi regulates the embryonk: segmentation and adult bristle patterning. - Dro- 
sophila enhancer of split proteins E(spl). that are hairy-like proteins active during neurogenesis, also act as transcrip- 
tbnal repressors. - Drosophila twist (twi) protein, which is involved in the establishment of gemn layers in embryos. - 
Maize anthocyanin regulatory proteins R-S and LC. - Yeast centromere-binding protein 1 (CPF1 or CBF1 ). This protein 

ss is involved in chromosomal segregatwn. It binds to a highly consented DNA sequence, found in centromers and In 
several promoters. - Yeast lfNl02 and IN04 proteins. - Yeast phosphate system positive regulatory protein PH04 whch 
interacts with the upstream activating sequence of several acid phosphatase genes. - Yeast serine-rich protein TYE7 
that is required for ty-mediated ADH2 ©(pressbn. - Neurospora crassa nuc-1 , a protein that activates the transcription 
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of structural genes for phosphorus acquisition. - Fission yeast protein esci which is involved in the sexual differentiation 
process. The schematic representation of the helix-loop-helix domain is shown here: 

xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx Amphipathic helix 1 Loop Amphipathic helix 2. 

The signature pattern developed to detect this domain spans completely the second amphipathic helix. 
5 [0818] Consensus pattern: [DENSTAP]^iaRHLI\^GSlNrrHFYWCPHKRHUVMT]-[LIVMJ- x(2)-ISTAV]-[LIVM- 
STACKR]-x-[VMFYHHLIVMTA]-{PHPHUVMRKHQ].- 

1 1] Murre C, McCaw RS.. Baltimore D. Cell 56:777-763(1 989V 
[ 2] Gan^el J., Campuzano S. BioEssays 13:49S498(1991). 
10 I 3] Kato G J., Dang C. V FASEB J. 6:3065-3072(1 992). 

[ 4] Krause M., Fire A.. Hanison S.W.. Priess J., Weintraub H. Ceil 63:907-919(1990). 
[ 5] RIechmann V., van Cruechten I.. Sablitzky F. Nucleic Acids Res. 22:749-755(1994). 

[0819] 276. HMG14 and HMG1 7 signature 

IS High mobilrty group (HMG) proteins are a family of relatively low molecular weight nonhistone components in chromatin. 
HMG14and HMG17 [1], two related proteins of akx>ut 100 amino acid residues, bind to the inner side of the nucleosomal 
DNA thus altering the interaction between the DNA and the histone octamer. These two proteins may be involved in 
the process which maintains transcribable genes m a unique chromatin conforrrtation. The trout nonhistone chromo- 
somal protein H6 (histone T) also belongs to this family. As a signature pattern a conserved stretch of 10 residues 

20 located In the N-tennlnal section of HMG14 and HMG17 was selected. 
[0820] Consensus pattern: R-R-S-A-R-L-S-A-{RK]-P- 

[0821] [ 1] Bustin M., Reeves R. Prog. Nucleic AckJ Res. Mol. Biol. 54:35-100(1996). 
[0822] 277. Hydroxymethylglutaryl-coenzyme A lyase active site (HMGL1 ) 

3-hydroxy-3-methylglutarykx)enzyme A lyase (HMG-CoA lyase or HL) (EC 4.1.3.4 )catalv2es the transfonnatlon of 
2S H MG-CoA into acetylO>A and acetoacetate. In vertebrates It is a mitochondrial enyme which is Involved In ketogenesis 
and In leucine catabolism [1]. In some bacteria, such as Pseudomonas mevalonii, it is involved in mevalonate catab- 
olism (gene rrrvaB). A cysteine has been shown[2], in mvaB. to be required for the activity of the enzyme. The region 
around this residue is perfectly consented and is used as a signature pattern. 
[0823] Consensus pattern: S-V-A-G-L-6-G-C-P-Y [C is the active site residue]- 

30 

1 1] Mitchell G.A., Ftobert M.-R, Hruz RW.. Wang S., Fontaine G.. Behnke C.E., Mende-Mueller LM., Schappert 
K., Lee C„ Gibson K.M., Mizbrko H.M. J. Bk)l. Chem. 268:4376-4381(1993). 
[ 2] Hruz RW.. Narasimhan C, Mizbrko H.M. Bk«hemistry 31:6842-6847(1992). 

35 [0824] Alpha-isopropylmalate and homocitrate synthases signatures (HMGL2) 

The foltowing enzymes have been shown (1] to be functkvially as well as evolutionary related: - Atpha-isoprppylmalate 
synthase (EC 4.1.3.12) whk:h catalyzes the first step in the biosynthesis of leucine, the condensatkjn of acetyl<:oA 
and alpha- ketoisovalerate to form 2-isopropylmalate synthase. - Homocitrate synthase (EC 4. 1.3.21) (gene nifV) whfch 
is involved in the bnsynthesis of the iron-molybdenum cofactor of nitrogenase and catalyzes the condensat»n of 

40 acetyl-CoA and alpha-ketoglutarate Into honxx:rtrate. - Soybean late nodulln 56. - Methanococcus jannaschii hypo- 
thetical proteins MJ0503, MJ1195 and MJ1392. Two conserved regions were selected as signature patterns for these 
enzymes. The first regwn is kxated in the N-terminal section while the second region is located In the central section 
and contains two conserved histkline reskiues whk:h couU be implk:ated in the catalytk: mechanism. 
[0825] Consensus pattern: L-FHDE]-G-x-Ox(10)-K- 

45 Consensus pattern: [LIVMFW]-x(2)-H-x-H-[DN]-D-x-G-x-[GAS]-x-[GASLI]- 

[0826] [ 1] Wang S.-2,, Dean D.R, Chen J.-S., Johnson J.L J. Bacteriol. 173:3041-3046(1991). 
[0827] 278. (HMG COA synt) Hydroxymethylglutaryl-coenzyme A synthase active site Hydroxymethylglutaryl-coen- 
zyme A synthase (EC 4.1.3.5) (HMG-CoA synthase) catalyzes the condensation of acetyl-CoA with acetoacetyl-CoA 
to produce HMG- CoAand CoA [1].ln vertebrates there are two isozymes kx»ted in different subcellular compartments: 

so a cytosolic form whch is the starting point of the mevatonate pathway which leads to cholesterol and other sterolic and 
isoprenokJ compounds and a mitochondrial form responsible for ketone body bbsynthesis. HMG-CoA is also found in 
other eukaryotes such as Insect, plants and fungi. A cysteine is known to act as the catalytic nucleophlle In the first 
step of the reaction, the acetylatbn of the enzyme by acetyl-CoA. The consented regbn was used around this active 
site reskjue as a signature pattern. 

55 [0828] Consensus pattern: N-x-(DN]-[IVl-E-G-(IV]-D-x(2)-N-A-C-[FY]-x-G [C is the active site resldue]- 

[0829] [ 1] Rokosz LL, Boulton D.A., Butkiewcz E.A., Sanyal G.. Cueto M.A.. Lachance RA., Hemies J.D. Arch. 

Bkxhem. Biophys. 312:1-13(1994). 

[0830] 279. HMG (high mobility group) box 
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[0831] 280. HSF-type DNA-binding domain signature 

[0832] Heat shock factor (HSF) Is a DNA-blnding protein that specrficatly binds heat shock promoter elements (HSE). 
HSE is a pallndromk: element rfch with repetitive purine and pyrimkJine motifs: 5*-nGAAnnTTCnnGAAnnTTCn-3'. HSF 
is expressed at normal temperatures but is activated by heat shock or chemical stressors [1 ,2]. The sequences of HSF 

s from various species show extensive similarity in a region of about 90 amino acids, which has been shown [3] to bind 
DNA. Some other proteins also contain a HSF domain, these are: - Yeast SFL1. a protein involved in cell surface 
assembly and regulation of the gene related to floccubtfon (asexual cell aggregatbn) [4]. - Yeast transcriptfon factor 
SKN7 (or BRY1 or POS9), whfch binds to the promoter elements SCB and MCB essential for the control of G 1 cyclins 
expression (5). - Yeast MGA1. - Yeast hypothetfcal protein YJRl47w. A pattern from the most conserved part of the 

10 HSF DNA-binding domain was derived, its central regwn, 

[0833] Consensus pattern: L-x(3HFYhK-H-x-N-x-[STANl-S-F4LIVMhR<)-L-[NH]-x-Y-x4FYW]-[RKH]-K-[LIVM]- 

[ 1] Scrger RK. Cell 65:363-3660991). 

[ 2] Magcr W.H., Moradas Ferreira R Bkx:hem. J. 290:1-13(1993). 
IS [ 3] Vuister G.W., Kim S.-J., Orosz A. Marquardt J., Wu C, Bax A. Nat. Struct. Biol. 1:605-613(1994). 

[ 4] Fujrta A.. Kikuchi Y, Kuhara S.. MIsumi Y. Matsumoto S.. Kobayashi H, Gene 85:321-328(1989). 
[ 5] Morgan B.A.. Bouquin N., Merrill G.F., Johnston LH. EMBO J. 14:5679-5689(1995). 

[0834] 281 . Heat shock hsp20 proteins family profile 
20 Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by inducing the synthesis 
of proteins collectively known as heat-shock proteins (hsp) [1]. Amongst them is a family of proteins with an average 
molecular weight of 20 Kd, known as the hsp20 proteins [2 to 5]. These seem to act as chaperones that can protect 
other proteins against heat-induced denaturatkxi and aggregation. Hsp20 proteins seem to form large heterooligomeric 
aggregates; their family is currently composed of the following members: - Vertebrate heat shock protein hsp27 (hsp25), 
2S induced by a variety of environmental stresses. - Drosophila heat shock prc^eins hsp22, hsp23, hsp26, hsp27, hsp67BA 
and BC. - Caenorhabditis elegans hsp16 multigene family - Fungal HSP26 (budding yeast) and hsp30 (Neurospora 
crassa and Aspergillus NIdulans). - Plant small hsp's. Plants have four classes of hsp20: classes I and II which are 
cytoplasmic, class III whk:h is chtoroplastk: and class IV which is found in the endomembrane. - Alpha^rystallin A and 
B chains. Alpha-crystallin Is an abundant constituent of the eye lens of most vertebrate species, its main functton 
30 appears to be to maintain the correct refractive index of the lens. It Is also found in other tissues where it seems to act 
as a chaperone [6]. - Schistosoma mansoni major egg antigen p40. Structurally. p40 is built of two tandem hsp20 
domains. - A variety of prokaryotk: proteins: ibpA and ibpB from Escherichia coll, hsp18 from Clostridium acetobutyli- 
cum. spore protein SP21 (hspA) from Stigmatella aurantiaca. Mycobacterium leprae 1 8 Kd antigen and Mycobacterium 
tuberculosis 14 Kd antigen. - Methanococcus jannaschii hypothetical protein MJ0285.Structurally, this family is char- 
ts acterized by the presence of a conserved C-terminal domain of about 1 00 resklues. The profile devetoped to detect 
members of the hsp20 family is based on an alignment of this donr^ain. 
[0835] -Sequences known to betong to this class detected by the profile: ALL. 

[ 1] Undquist S., Craig E.A. Annu. Rev. Genet 22:631 '677(1988).[ 2] de Jong W.W., Leunissen J.A.M.. Vborter C.E. 
M. Mol. Bral. Evol. 10:103-126(1993).[3] Caspers G.J.. Leunissen J.A.M., de Jong W.W. J. Mol. Evol. 40:238-248 
40 (1995).[ 4] Jaenicke R.. Creighton TE. Curr. Biol. 3:234-235(1 993). [ 5] Jakob U., Buchner J. Trends Bk)chem. Sci. 19: 
205-211 (1994).[ 6] Groenen RJ.TA., Merck K.B.. de Jong W.W.. Bloemendal H. Eur. J. Bkx:hem. 225:1-9(1994). 
[0836] 282. Heat shock hsp70 proteins family signatures 

[0837] Prokaryotc and eukaryotic organisms respond to heat shock or other environmental stress by the inductkxi 
of the synthesis of proteins collectively known as heat-shock proteins (hsp) [1 ]. Amongst them Is a family of proles 

45 with an average molecular weight of 70 Kd, known as the hsp70proteins [2,3,4]. In most species, there are many 
proteins that betong to the hsp70 family. Some of them are expressed under unstressed conditions. Hsp70proteins 
can be found in different cellular compartments (nuclear, cytosolic. mitochondrial, endoplasmic reticulum, etc.). Some 
of the hsp70 family proteinsare listed bek>w: - In Escherchia coli and other bacteria, the main hsp70 protein is known 
as the dnaK protein. A second protein. hscA. has been recently discovered. dnaK is also found in the chloroplast 

so genome of red algae. - In yeast, at least ten hsp70 proteins are known to exist: SSA1 to SSA4, SSB1 , SSB2, SSC1 , 
SSD1 (KAR2), SSE1 (MSI 3) and SSE2. - In Drosophila. there are at least eight different hsp70 proteins: HSP70, 
HSP68, and HSC-1 to HSC-6. - In mammals, there are at least eight different proteins: HSPA1 to HSPA6, HSC70, and 
GRP78 (also known as the immunogtobulin heavy chain binding protein (BiP)). - In the sugar beet yeltow virus (SBYV), 
a hsp70 homotog has been shown [5J to exist. - In archaebacteria, hsp70 proteins are also present [6].AII proteins 

ss belonging to the hsp70 family bind ATP. A variety of functions has been postulated for hsp70 proteins. It now appears 
[7] that some hsp70proteins play an important role in the transport of proteins across membranes. They also seem to 
be involved In protein foWing and in the assembly/disassembly of protein complexes [8]. Three signature patterns for 
the hsp70 family of proteins were derived; the first centered on a consenred pentapeptkJe found in the N-terminal 
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section of these proteins; the two others on conserved regions located in the central part of the sequence. 
[0838] Consensus pattern: [I VJ-0-L-G-T4ST>x4SC] - 

Consensus pattern: [LIVMF]^UVMFY]4DN]-[UVMFS]-G4GSH]-[GS]-[AST]'X(3)-[ST]-[LIVM]4L^ 
Consensus pattern: lUVMY]-x4UVMn-x-G-G'X-{ST>x-{LIVM]-P-x-[LI VM]-x-{DEQKRSTA)- 

[ 1] Lindquist S.. Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 
[ 2] Pelham H.R.B. Cell 46:959-961(1986). 

( 3] Pelham H.R.B. Nature 332:776-77(1 988). [ 4] Craig EA BioEssays 11:48-52(1989). 

[ 5] Agranovsky A.A, Boyko VP. Karasev A.V, Koonin E.V, Doija V.V. J, Mol. Biol. 217:603-610(1991). 

[ 6] Gupta R.S., Singh B. J. Bacteriol. 174:4594-4605(1992). 

( 7] Deshaies R.J.. Koch B.D., Schekmam R. Trends Bkx:hem. Sci. 13:384-388(1988). 
[ 8] Craig E.A.. Gross C.A. Trends Bkjchem. Sci. 16:135-140(1991). 

[0839] ^3. Heat shock hsp90 proteins family signature 

Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by the induction of the 
synthesis of proteins collectively known as heat-shock proteins (hsp) [1J. AnrKxigst them is a family of proteins, with 
an average molecular weight of 90 Kd, known as the hsp90proteins. Proteins known to be\ong to this family are: - 
Eschericho coli and other bacteria heat shock protein c62.5 (gene htpG). - Vertebrate hsp go^alpha (hsp 86) and hsp 
90-beta (hsp 84). - Drosophiia hsp 82 (hsp 83). - Trypanosoma cruzi hsp 85. - Plants Hsp82 or Hsp83 - Yeast and 
other fungi HSC82, and HSP82. - The endoplasmk: reticulum protein 'endoplasmin' (also known as Erp99 in mouse, 
GRP94 in hamster, and hsp 108 in chk:ken).The exact function of hsp90 proteins is not yet known. In higher eukaryotes, 
hsp90 has been found associated wrth steroki hormone receptors, with tyrosine kinase oncogene products of several 
retroviruses, with elF2alpha kinase, and with acttn and tubulin. Hsp90 are probable chaperonins that possess ATPase 
activity [2.3].As a signature pattem for the hsp90 family of proteins, a highly consented region found in the N-tenninal 
part of these proteins was selected. 

[0840] Consensus pattem: Y-x-IIMQH}-K-(DE]-[IVApF-L-R-{ED] - 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988). 

[ 2] Nadeau K.. Das A.. VUblsh C.T J. Biol. Chem. 268:1479-1487(1993). 

[ 3] Jakob U.. Buchner J. Trends Bkx:hem. ScL 19:205-211(1994). 

[0841] 284. Helix-tum-helix (HTH3) 

[0842] This large family of DNA binding helix-tum helix proteins includes Cro Swiss:P03036 and CI Swiss: P03034. 
[0843] 285. Heme oxygenase signature 

Heme oxygenase (EC 1.14.99.3) (HO) [1 ] is the mkirosomal enzyme that, in animals, carries out the oxkJation of heme, 
it cleaves the heme ring at the alpha methene bridge to form biliverdin and cartKxi mor\ox\6e. Biliverdin is subsequently 
converted to bilinibin by biliverdin reductase. In mammals there are three isozymes of heme oxygenase: HO-1 to HO- 
3. The first two isozymes differ in their tissue ^resskxi and their indudbility: HO-1 is highly inducible by its substrate 
heme and by various non-heme substances, while HO-2 is non-inducible. It has been suggested [2] that HO-2 couW 
be implicated in the productbn of carbon monoxide in the brain where it is said to act as a neurotransmitter. In the 
genome of the chtoroplast of red algae as well as in cyanobacteria, there is a heme oxygenase (gene pbsA) that is the 
key enzyme in the synthesis of the chronrjophoric part of the photosynthetc antennae [3]. An heme oxygenase is also 
present in the bacteria Corynebaderium dtphtheriae (gene hmuO), where it is involved in the acqulsitkxi of iron from 
the host heme [4].There is. In the central sectnn of these enzymes, a well consented regbn centered on a histkiine 
residue whkrfi is proposed to play a key role In binding the substrate heme at the active center of the enzyme. This 
region was used as a signature pattem. 

[0844] Consensus pattem: L-(iy|-A-H-[STACHhY-[STV]-[RT|-Y-(LIVM]-G [H binds the heme] 

[ 1] Maines M.D. FASEB J. 2:2557-2568(1988). 
[ 2] Barinaga M. Science 259:309-309(1993). 

[ 3] Richaud C, Zabuton G. Proc. Natl. Acad. ScL U.S.A. 94:11736-11741(1997). 
[ 4] Schmitt M.R J. Bacteriol. 179:838-845(1997). 

[0845] 286. Hepatitis core antigen. 

[0846] The core antigen of hepatitis viruses possesses a carboxyl terminus rich in arginine. On this basis it was 
predicted that the core antigen woukJ bind DNA [1]. There is sonr^e experimental evkJence to support this [2]. 
[0847] [1] Pasek M, Goto T, Gilbert W. Zink B, Schaller H, Mckay P, Leadbetter G, Murray K; Nature 1979;282: 
575-579. [2J Gallina A. Bonelli F. Zentilin L. Rindi G. Muttini M, Milanesi G; J Virol 1 989;63:4645-4652. 
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[0848] 287. Histtdine biosynthesis protein 

[0849] Proteins involved in steps 4 and 6 of the histidine biosynthesis pathway are contained in this family Hislidine 
is formed by several complex and distinct biochemical reactions catalysed by eight enzymes. The enzymes in this 
Ram entry are called His6 and His7 in eukaryotes and HisA and HisF in prokaryotes. 
s [0850] [1] Fani R, Tamburini E. Mori E, Lazcano A, Uo P, Barberio C, Casakxie E, Cavalieri D, Perito B, Polslnelll 
M. Gene 1997;197:9-17. [2] Fani R, Uo P. Chiarelli I. Bazzicalupo M, J Mol Evol 1994;38:489-495. 
[0851] 288. Histone deacetylase family 

[0852] Histones can be reversibly acetylated on several lysine residues. Regulation of transcription is caused in part 
by this mechanism. Histone deacetylases catalyse the removal of the acetyl group. Histone deacetytases are related 
10 to other proteins [1 ]. 

[0853] Leipe DO, Landsman D. Nucleic Acids Res 1997;25:3693-3697. 
[0854] 289. Hislidinol dehydrogenase signature 

Histidinol dehydrogenase (EC 1.1.1.23 ) (HDH) catalyzes the temriinal step in the biosynthesis of histidine in bacteria, 
fungi, and plants, the four-electron oxidation of L-histidinol to histidine.ln bacteria HDH is a single chain polypeptide; 

IS In fungi it is the C-terminal domain of a multifunctional enzyme which catalyzes three different steps of histidine bio- 
synthesis; and in plants it is expressed as nuclear encoded protein precursor which is exported to the chloroptast [1]. 
As a signature pattern a highly consen/ed region located in the central part of HDH was selected. This region does not 
correspond to the part of the enzyme that, in most, but not ail HDH sequences contains a cysteine residue which, in 
Salmonella typhimurium, has been said 12] to be important for the catalytic activity of the enzyme, 

20 [0855] Consensus pattern: l-D-x{2)-A-G-P4ST]-E-{LIVSHUVMA]{3HAChx(3)-A-x(4)-(UVM]4AV)-[SACLHDE]- 
[LIVMFCHLIVMHSAJ-x(2)-E-H- 

[ 1] Nagai A., V\fard E., Beck J., Tada S., Chang J.-Y, Scheidegger A., Ryals J. Proc. Natl. Acad. Sci. U.S.A. 88: 
4133-4137(1991). 

2S 1 2] Gmbm^er C.T., Gray W.R. Biochemistry 25:4778-4784(1 986). 

[0856] 290. Homoserine dehydrogenase signature 

Homoserine dehydrogenase (EC 1.1.1.3) (HDh) [1,2] catalyzes NAD-dependent reduction of aspartate beta-semial- 
dehyde into homoserine. This reaction is the third step in a pathway leading from aspartate to homoserine. The latter 

30 participates in the biosynthesis of threonine and then Isoleuclne as well as in that of methionine. HDh is found either 
as a single chain protein as in some bacteria and yeast, or as a brfunctional enzyme consisting of an N-terminal as- 
partokinase domain and a C-terminal HDh domain as in bacteria such as Escherichia coli and in plants. As a signature 
pattem, the best consented regkm of Hdh has been selected. This is a segment of 23 to 24 residues located in the 
central sectton an6 that contains two consented aspartate residues. 

35 [0857] Consensus pattern: A-x(3)-G-[UVMFY]-[STAG]-x(2.3)-[DNS]-P-x(2)-D-[LIVIVI]-x-G-x-D-x(3)-K- 

[ 1] Thomas D.. Barbey R., Surdin-Kerjan Y. FEBS Lett. 323:289-293(1993). 
[ 2] Cami B., Clep^ C, Patte J.-C. Bkxhimie 75:487-495(1993). 

40 [0858] 291 . hatoacid dehatogenase-like hydrolase 

[0859] This family is structurally different from the alpha/ beta hydrolase family (abhydrolase). This family includes 
L-2-hak>ackJ dehatogenase, epoxide hydrolases and phosphatases. The structure of the family consists two do- 
mains. One is an inserted four helix bundle, which is the least well consented regbn of the alignment, between residues 
16 and 96 of Swiss:P24069. The rest of the foW is composed of the core alpha/beta domain. [1 ] Hisano T, Hata Y, Fujii 

4S T, Liu JQ, Kurihara T, Esaki N, Soda K, J Bio^ Chem 1996; 271 :20322-20330. 

[0860] 292. DEAD and DEAH box families ATP-dependent helicases signatures (helicase_C) 
A number of eukaryotic and prokaryotk; proteins have been characterized [1 ,2,3] on the basis of their structural simi- 
larity. They all seem to be involved in ATP-dependent. nuclec-ackJ unwinding. Proteins currently known to bekxig to 
this family are: - Initiation factor elF-4A. Found in eukaryotes, this protein is a subunit of a high molecular weight 

50 complex involved in 5*cap recognitton and the binding of mRNA to ribosomes. It is an ATP-dependent RNA-helicase. 

- PRP5 and PRP28. These yeast proteins are involved in varbus ATP-requlring steps of the pre-mRNA splicing process. 

- Pi10. a mouse protein expressed specifically during spermatogenesis. - An3, a Xenopus putative RNA helrcase, 
closely related to PI10. - SPP81/DED1 and DBP1. two yeast proteins probably involved in pre-mRNA spfcing and 
related to RIO. - Caenorhabditis elegans helicase glh-1 . - MSS1 1 6, a yeast protein required for mitochondrial splicing. 

ss • SPB4, a yeast protein involved in the maturation of 258 ribosomal RNA. - p68, a human nuclear antigen. p68 has 
ATPase and DNA-helcase activities in vitro. It is involved in cell growth and division. - Rm62 {p62), a Drosophila 
putative RNA helicase related to p68. - DBP2, a yeast protein related to p68. - DHH1 , a yeast protein, - DRS1 . a yeast 
protein involved in ribosome assembly. - MAK5, a yeast protein involved in maintenance of dsRNA kilter plasmid. - 
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ROK1 , a yeast protein. - stel3, a fission yeast protein. - \^sa, a Drosophila protein important for oocyte formation and 
specification of embryonic posterior stmctures. - Me31B, a Drosophila maternally expressed protein of unknown func- 
tion. - dbpA, an Escherichia coli putative RNA hellcase. - deaD, an Escherichia coli putative RNA helicase which can 
suppress a mutation in the rpsB gene for ribosonral protein S2. - rhiB, an Escherichia coli putative RNA helicase. - 

s rhIE, an Escherichia coli putative RNA helicase. - smriB, an Escherichia coli protein that shows RN A-dependent ATPase 
activity. It probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans hypothetical proteins T26G10.1. 
ZK512.2 and ZK686.2. - Yeast hypothetical protein YHR065c. - Yeast hypothetical protein YHR169w. - Fission yeast 
hypothetical protein SpACSI A2.07c. - Bacillus subtills hypothetical protein yxIN. All these proteins share a number of 
conserved sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding 

10 proteris or by proteins belonging to the helicases 'superfamil/ [4,E1]. One of these motifs, called the 'D-E-A-D-box', 
represents a special version of the B mc^if of ATP-binding proteins. Some other proteins belong to a subfamily which 
have His instead of the second Asp and are thus said to be 'D-E-A-H-box* proteins [3,5.6. El]. Proteins currently known 
to belong to this subfamily are: - PRP2. PRP16, PRP22 and PRP43. These yeast proteins are alt involved in various 
ATP-requiring steps of the pre-mRNA splrcing process. - Fission yeast prhl , which my be involved in pre-mRNA splicing. 

IS - Male-less (mie), a Drosophila protein required in males, for dosage compensation of X chromosome linked genes. - 
RAD3 from yeast. RAD3 is a DNA helk^ase involved in excision repair of DNA damaged by UV light, bulky adducts or 
cross-linking agents. Fisskyi yeast radl5 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2) are the 
homotogs of RAD3. - Yeast CHL1 (or CTF1), whfch is important for chrorTX)some transmisston and normal cell cycle 
progressbn in G(2)/M. - Yeast TPS1. - Yeast hypothetical protein YKL07Bw. - Caenorhabditis elegans hypothetical 

20 proteins C06E1 . 1 0 and K03H1 .2. - Poxvimses' early transcription factor 70 Kd subunit which acts with RNA polymerase 
to initiate transcriptbn from early gene prorrtoters. - IB, a putative vaccinia virus helicase. - hrpA, an Escherbhia coli 
putative RNA helk:ase. Signature patterns were devek)ped for both subfamilies. 
[0861] Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN]-x-[LIVMFYGSTN]- 
Consensus pattern: IGSAH]-x-[LI VMF](3)-D-EH[ALIV1-H-[NECRJ - 

2S Note: proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop) (see the relevant 
entry <PDQC00017 

[ 1] Schmid S.R, Under R Mol Microbbl. 6:283-292(1992). 

[ 2] LInder R, Lasko P., Ashburner M., Leroy P, Nielsen PJ., NishI K., Schnier J.. Slonimski PR Nature 337:121 -1 22 
30 (1989). 

[ 3] W^ssarman D.A., Steitz J. A. Nature 349:463-464(1991). 

[ 4] Hodgman TC. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 

[ 5] Harosh L. Deschavanne P. Nuciek: Ackis Res. 19:6331-6331(1991). 

[ 6] Koonin E.V., Senkevfch TG. J. Gen. Virol. 73:989-993(1992). 

as 

[0862] 293. Heme-binding domain in cytochrome bS and oxidoreductases (heme_1 ) 

[0863] Cytochrome b5 is a membrane-bound hemo prc^ein whk:h acts as an electron carrier for several membrane- 
bound oxygenases [1]. There are two homok>gous forms of b5, one found In microsomes and one found in the outer 
membrane of mitochondria. Two consented histldine residues sen^e as axial ligands for the heme group. The structure 
40 of a number of oxkJoreductases consists of the juxtaposition of a heme-binding domain homotogous to that of b5 and 
either a flavodehydrogenase or a molybdopterin domain. These enzymes are: 

- Lactate dehydrogenase (EC 1.1.2.3) [2], an enzyme that consists of a flavodehydrogenase domain and a heme- 
binding domain called cytochrome b2. 

^ - Nitrate reductase (EC 1.6.6.1) . a key enzyme involved in the first step of nitrate assimilatbn in plants, fungi and 
bacteria [3,4]. Consists of a molybdopterin domain (see <PDQC00484 >). a heme-binding domain called cyto- 
chrome b557, as well as a cytochrome reductase domain. 

- Sulfite oxidase (EC 1.8.3.1 ) [5J, whch catalyzes the terminal reaction in the oxidative degradatk>n of sulfur-con- 
taining amino ackls. Also consists of a molybdopterin domain and a heme-binding domain. 

so 

This family of proteins also iricludes: 

- TU-36B, a Drosophila muscle protein of unknown functkxi [6J. 

- FIsskxi yeast hypothetkal protein SpACI F1 2. 1 0c. 
ss - Yeast hypothetkal protein YMR073C. 

- Yeast hypothetcal protein YMR272c, 

[0864] A segment was used whk:h Includes the first of the two histkiine heme ligands. as a signature pattern for the 
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heme-binding domain of cytochrome b5 family. 

[0865] Consensus pattem: [FYHLIVMK]-x(2)-H-P4GAhG (H Is a heme axial ligand]- 

[1] Ozols J. Biochim. Blophys. Acta 997:121-130(1989). 
s [2] Guiard B. EMBO J. 4:3265-3272(1 985). 

[3] Galza R.. Huttner E., Vincente M., Rouze P.. Galangau R, VaudhereX H., Cherel L, Meyer G.. Kronenberger J., 
Caboche M. Mol. Gen. Genet. 209:552-562(1987). 

14] Crawford N.M., Smith M., Bellissimo D., Davis R.W. Proc. Natl. Acad. Scl. U.S.A. 85:5006-5010(1988). 
[5] Guiard B.. Lederer F. Eur. J. Biochem. 100:441-453(1979). 
10 [6] Levin R.J., Boychuk PL. Croniger CM.. Kazzaz J.A., Rozek C.E. Nucleic Acids Res. 17:6349-6367(1989). 

[0866| 294. Hexapeptide-repeat containing-transferases signature 

On the basis of sequence similarity, a number of transferases have been proposed [1 ,2,3,4] to belong to a single family 
These proteins are: - Serine acetyttransferase (EC 2.3.1.30) (SAT) (gene cysE). an enzyme involved in cysteine bio- 

is synthesis, - Azotobacter chroococcum nitrogen fixation protein nifR NifP is most probably a SAT involved in the opti- 
mization of nitrogenase activity. - Escherichia coli thiogalactoside acetyttransferase (EC 2.3.1.18) (gene lacA), an en- 
zyme involved in the biosynthesis of lactose. - UDP-N-acelylglucosamine acyttransf erase (EC 2.3.1.129 ) (gene IpxA), 
an enzyme involved in the biosynthesis of lipid A, a phosphorylated glycolipid that anchors the llpopolysaccharide to 
the outer membrane of the cell. - UDP-3-0-[3-hydroxymyristoyl] glucosamine N-acyltransferase (EG 2.3.1.-) (gene 

20 ipxD or firA), which is afeo involved in the biosynthesis of lipid A. - Chloramphenicol acetyttransferase (CAT) (EC 
2.3.1.28) from Agrobacterium tumefaciens. Bacillus sphaericus, Escherichia coli ptasmid IncFII NR79, Pseudomonas 
aeruginosa, Staphylococcus aureus plasmid plP630. These CAT are not evolutionary related to the main family of CAT 
(see <PDOC00093 >). - Rhizobium nodutation protein nodL. NodL is an acetyttransferase involved in the O-acetytation 
of Nod factors. - Bacterial mattose O-acetyttransferase (EC 2.3.1.79 ). - Bacterial tetrahydrodipicolinate N-succinyl- 

2S transferase (EC 2.3.1.117) (gene dapD) which catalyzes the fourth step in the biosynthesis of diamlnopim elate and 
lysine from aspartate semialdehyde. - Bacterial N-acetylglucosamine-1 -phosphate uridyttransferase (EC 2.7.7.23 ) 
(gene gImU or gcaD or tms). an enzyme involved in peptidoglycan and llpopolysaccharide biosynthesis. - Staphyloco- 
ccus aureus protein capG which is involved in biosynthesis of type 1 capsular polysaccharide. - Yeast hypothetical 
protean YJI-218w, which is highly similar to Escherichia coli lacA. - Fission yeast hypothetical protein SpAC18B11 .09c. 

30 - Methanococcus jannaschii hypothetical protein MJ1064.These proteins have been shown [3,4] to contain a repeat 
structure composed of tandem repeats of a [LIV]-G-x(4) hexapeptide which, in the tertiary stmcture of IpxA [5], has 
been shown to form a left-handed parallel beta helix. Our signature pattem is based on a fourfold repeat of this hexa- 
peptide. 

[0867] Consensus pattem: [LIV]-[GAEDhx(2)-[STAVl-x-[LIV]-x(3)-[LIVAC]-x-[LIV]- [GAEDhx(2)-[STAVR]-x-[LIV]- 
3S [GAED]-x(2)4STAV]-x-[LTVl- x(3)-{LIV]- 

[ 1] Downie J.A. Mol. Microbiol. 3:1649-1651(1989). 
[ 2) Parent R. Roy PH. J. Bacteriol. 174:2891-2897(1992). 
[ 3] Vaara M. FEMS Microbiol. Lett. 97:249-254(1992). 
40 [ 4] VUorio R., Haeritonen T, Tolvanen M.. Vaara M. FEBS Lett. 337:289-292(1994). 

[ 5] Raetz C.R.H,, Roderick S.L Science 270:997-1000(1995). 

[0868] 295. Hexokinases signature. Hexokinase (EC 2.7.1.1) [1,2J is an important glycolytic enzyme that catalyzes 
the phosphorylatk)n of keto- and aklohexoses (e.g. glucose, mannose and fructose) using MgATP as the phosphoryl 

45 donor. In vertebrates there are four major isoenzymes, commonly referred as types 1,11, 111 and IV. Type IV hexokinase, 
whtth is often incorrectly designated glucokinase [31 is only expressed in liver and pancreatic beta-cells and plays an 
important role in modulating insulin secretin; rt is a protein of a molecular mass of about 50 Kd. Hexokinases of types 
I to III, which have tow Km values for glucose, have a molecular mass of about 100 Kd. Structurally they consist of a 
very small N-terminal hydrophobic membrane-binding domain followed by two highly similar domains of 450 residues. 

so The first domain has tost tts catalytk: activfty and has evolved into a regulatory domain. In yeast there are three different 
isozymes: hexokinase PI (gene HXK1), Pll(gene HXKB), and glucokinase (gene GLK1). All three proteins have a 
molecular mass of about 50 Kd. All these enzymes contain one (or two in the case of types I to 111 isozymes)strongly 
consen/ed regton which has been shown [4] to be involved in substrate binding. A pattern from that region has been 
derived 

SS [0869] Consensus pattem: [LIVM]-G-F-ITN]-F.S-[FYJ-P-x(5)-[LIVM]-(DNST]-x(3)-(LI VM]-x(2)-W-T-K-x-[LF]- 

[0870] [ 1] MkJdleton R.J. Btochem. Soc. Trans. 18: 180-1 83(1 990). [ 2] Griffin LD., Gelb B.D., Wheeler D.A., Davison 
D., Adanns V, McCabe E.R. Genomics 11 : 1 01 4-1 024(1 991 ).[ 3] Comish-Bowden A.. Luz Cardenas M. Trends Btochem. 
ScL 16:281-282(1991).! 4] Schirch D.M.. Wilson J.E. Arch. Btochem. Biophys. 254:385-396(1987). 
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[0871] 296. Histone H2A signature (hisi ) 

Histone H2A is one of the four histones, along with H2B. H3 and H4, which forms the eul<aryotic nucleosome core. 
Using alignments of histone H2Asequences [1 ,2,E1] as a signature pattern, a conserved region in the N-tenminal part 
of H2A. This region Is conserved both in classical S-phase regulated H2A's and in variant histone H2A's which are 
5 synthesized throughout the cell cycle. 

[0872] Consensus pattern: [AC]-G-L-x-F-P-V- 

[ 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher T.H., Gorovsky M A Nucleic Acids Res. 22:174-179(1994). 

10 

[0873] Histone H4 signature (his2) 

[0874] Histone H4 is one of the four histones. along with H2A. H2B and H3, which f omns the eukaryotic nucleosome 
core. Along with H3, it plays a central role in nucleosome formation. The sequence of histone H4 has remained almost 
invariant in more then 2 billion years of evolution [1 ^EIJ. The region used as a signature pattern is a pentapeptide found 
IS in positions 1 4 to 1 8 of all H4sequences. It contains a lysine residue which is often acetylated [2] and a histldine residue 
which is implicated in DNA-binding [3]. 
[0875] Consensus pattern: G-A-K-R-H- 

[ 1] Thatcher T.H.. Gorovsky M A. Nucleic Acids Res. 22:174-179(1994). 
20 [ 2] Doenecke D., Gallwitz D. Mol. Cell. Bk)chem. 44:113-128(1982). 

[ 3] Ebralkise K.K.. Grachev S.A., Mirzabekov A.D. Nature 331:365-367(1988). 

[0876] Histone H3 signatures (his3) 

Histone H3 is one of the four histones. atong with H2A, H2B and H4. which forms the eukaryotic nucleosome core. It 
2S Is a highly conserved protein of 1 35 amino ackJ residues [1 ,2£1].The following proteins have been found to contain 
a C-terminal H3-like domain: - Mammalian centromere protein CENP-A [3]. Coub act as a core histone necessary for 
the assembly of centromeres. - Yeast chromatln-associated protein CSE4 [4]. - Caenorhabditis elegans chromosome 
III encodes two highly related proteins (F54C8.2 and F5BA4.3) whose C-terminal section Is evolutbnary related to the 
last 100 reskiues of H3. The functbn of these proteins is not yet known. Two signature patterns were developed, The 
30 first one corresponds to a perfectly consen/ed heptapeptWe in the N-lerminal part of H3. The second one is derived 
from a consented regkxi in the central sectk>n of H3. 
[0877] Consensus pattern: K-A-P-R-K-Q-L- 
Consensus pattern: P-F-x-[RAJ-L-[VAHKRQHDEGI-[IV]- 

3S 1 1] Wells D.E., Brown D. Nuciek: Ackte Res. 19:2173-2188(1991). 

1 2] Thatcher TH., Gorovsky M.A. Nucleic Ackte Res. 22:174-179(1994). ^ 
1 3] Sullivan K.F., Hechenberger M., Masri K. J. Cell Bk>L 127:581-592(1994). 
[ 4] Stoler S., Keith K.C., Cumick K.E., Fitzgerald-Hayes M. Genes Dev. 9:573-586(1995). 

40 [0878] Histone H2B signature (his4) 

[0879] Histone H2B is one of the four histones, atong with H2A, H3 and H4. which forms the eukaryotc nucleosome 
core. Using alignments of histone H2Bsequences [1.2,E1], a consented regfon was selected In the C4erminal part 
ofH2B. 

[0880] Consensus pattern: IKRhE-[UVM]-[EQ]-T-x(2)-[KR]-x-[LIVM](2)-x-[PAG]-[DE]-L- x-[KR]-H-A-[LIVM]-[STA]- 
45 E-G- 

[ 1] Wells D.E., Brown D. Nuciek; Ackte Res. 19:2173-2188(1991). 

[ 2] Thatcher TH., Gorovsky M.A. Nucleic Ackte Res. 22:174-179(1994). 

so [0881] 297. 'Homeobox* domain signature and profile (homel ) 

The 'homeobox' is a protein domain of 60 amino ackte [1 to 5^ first Identified In a number of Drosophila homeotic 
and segmentation proteins. It has since been found to be extremely well consen/ed In many other animals, including 
vertebrates. This domain binds DNA through a helix-tum-helix type of structure. Some of the proteins whk:h contain a 
homeobox domain play an important rote in devetopment. Most of these proteins are known to be sequence specific 

ss DNA-binding transcription factors. The homeobox domain has also been found to be very similar to a region of the 
yeast mating type proteins. These are sequence-specific DNA-binding proteins that act as master switches in yeast 
differentiation by controlling gene expresston In a cell type-specific fashion. A schemata representatbn of the home- 
obox domain is shown betow. The helix^um-helix regton is shown by the symbols 'H' (for helix), and r (for turn). 
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xxxxxxx x xxxx xxx xxxxxxxxxxxxxxxHHHHHHHHtttHHHHHHm 1 1 1 1 1 1 

1 10 20 30 40 50 60 The pattern to detect homeobox sequences that was deve toped is 24 residues long and spans 
5 positions 34 to 57 ol the homeobox donnain. 

[0882] Consensus pattern: [LIVMFYGl^ASLVR]-x(2HUVMSTACN]-x4LIVM]-x(4)-IUV]-[RKNQESTAIY]-{LIVF- 
STNKH]-W-(FYVC]-x4NDQTAH]-x(5)- [RKNAIMW] - 

[ 1 ) Gehring W.J. (In) Guidetxx)k to the homebox genes. Duboule D. , Ed. . ppl -1 0, Oxford University Press. Oxford. 
10 (1994). 

[ 2) Buerglin TR. (In) Guidebook to the homebox genes, Duboule D., Ed.. pp25-72, Oxford University Press. Oxford. 
(1994). 

( 3J Gehring W.J. Trends Biochem. Scl. 17:277-280(1992). 
[ 4] Gehring WJ., HiromI Y. Annu. Rev. Genet. 20:147-173(1986). 
IS [ 5] Schofleki P.N. Trends Neurosci. 10:3^(1987). 

[0883] 'Homeobox* antennapedla-type protein signature (home2) 

The homeotk: Hox proteins are sequence-specific transcription factors. They are part of a developmental regulatory 
system that provkJes cells with specific positional Identities on the anterior-posterior (A-P) axis [1]. The hox proteins 

20 contain a 'homeobox* donnain. In Drosophila and other insects, there are eight different Hox genes that are encoded 
in two gene complexes, ANT-C and BX-C. In vertebrates there are 38 genes organized in four complexes. In six of the 
eight Drosophila Hox genes the homeobox domain is highly similar and a consen/ed hexapeptkie is found five to sixteen 
amino acids upstream of the homeobox domain. The six Drosophila proteins that betong to this group are antennapedia 
(Antp). abdominal-A (abd-A), deformed (Dfd), proboscipedia (pb),sex combs reduced (scr) and ultrabithorax (ubx) and 

2S are collectively known as the 'antennapedia' subfamily. In vertebrates the corresponding Hox genes are known [2] as 
HOX-A2. A3, A4,A5, A6. A7, Hox-BI. B2. B3, B4. B5, B6, B7. B8. Hox-C4. C5. C6, C8. Hox-D1.D3. D4 and 
DS.Caenorhabditis elegans lin-39 and mab-5 are also members of the 'antennapedia' subfamily. As a signature pattern 
for this subfamily of homeobox proteins, the consented hexapeptkte was used. 
[0884] Consensus pattern: [U Vf^FEHFYJ-P-W-M-[KRQTA]- 

30 

1 1] McGinnis W. Krumlauf R. Cell 68:283-302(19921 
[ 2] Scott M.R Cell 71:551 -553(1 992V 

[0885] 'Homeobox' engrailed-type protein signature (home3) 

3S [0886] Most proteins whfch contain a 'homeobox'donnain can be classified [1 ,2], on the basis of their sequence char- 
acteristk:s, in three subfamilies: engrailed, antennapedia and paired. Proteins currently known to bekxig to the engrailed 
subfamily are: - Drosophila segmentatkxi polarity protein engrailed (en) whfch specifies the body segmentatbn pattern 
and is required for the devetopment of the central nen/ous system. - Drosophila invected protein (inv). - Silk moth 
proteins engrailed and Invected, whfch may be Involved in the compartmentalizatkxi of the silk gland. - Honeybee E30 

40 and E60. - Grasshopper (Schistocerca americana) G-En. - Mammalian and birds En-1 and En-2. - Zebrafish Eng-1 ,- 

2 and -3. - Sea urchin (Tripneusteas gratilla) SU-HB-en. - Leech (Hetobdella Iriserialis) Ht-En. - Caenorhabditis elegans 
ceh-16.Engrailcd homeobox proteins are characterized by the presence of a conserved regk>n of some 20 amino^kJ 
resklues tocated at the C-tenminal of the 'homeobox' domain. As a signature pattern for this subfamily of proteins, a 
stretch of eight perfectly consen/ed resklues in this regkxi was used. 

45 [0887] Consensus pattem:L-M-A-[EQl-G-L-Y-N- 

[ 1] Scott M R. Tamkun J.W., Hartzell G.W. Ill Bkjchim. Bk)phys. Acta 989:25-48(1989). 
[ 2] Gehring W.J. Science 236:1245-1252(1987). 

so [0888] 298. Isocitrate lyase signature (ICL) 

Isocitrate lyase (EC 4.1.3.1 ^ [1 .2] is an enzyme that catalyzes the converskxi of isocitrate to succinate and glyoxylate. 

This is the first step in the glyoxylate bypass, an altemative to the tricarboxylk: acid cycle in bacteria, fungi and plants. 

A cysteine, a histidine and a glutamate or aspartate have been found to be important for the enzyme's catalytic activity 

Only one cysteine resMue Is conserved between the sequences of the fungal, plant and bacterial enzymes; it is kx:ated 
ss in the middle of a conserved hexapeptkte that can be used as a signature pattern for this type of enzyme. 

[0889] Consensus pattern: K-(KR]-C-G-H4LMQ] [C is a putative active site residue]- 

[ 1] Beeching J.R. Protein Seq. Data Anal. 2:463466(1989). 
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[ 2] Atomi H., Ueda M., Hikida M., Hishida T. Teranishi Y. Tanaka A. J. Biochem. 107:262-266(1990). 
[0890] 299. Initiation factor 2 subunrt 

[0891] This family includes initiation factor 2B alpha, beta and delta subunits from eukaryotes, related proteins from 
s archaebacteria and IF-2 from prokaryotes. Initiation factor 2 binds to Met-tRNA; GTP and the smalt ribosomal subunit. 
[0892] [1] Kyrpides NC. Woese CR. Proc Natl Acad Sci U S A 1998; 95: 3726-3730. 
[0893] 300. Initiation factor 3 signature 

Initiatkxi factor 3 (IF-3) (gene infC) [1] is one of the three factors required for the Initiation of protein biosynthesis in 
bacteria. IF-3 is thought to function as a fidelity factor during the assembly of the ternary initiation complex which consist 

10 of the SOS ribosomal subunit, the initiator tRNA and the messenger RNA. IF^ binds to the 308 ribosomal subunit; it 
is a basic protein of 141 to 212 residues. The chtoroplast initiation factor IF-3(chl) is a protein that enhances the poly 
(A.U,G)-dependent binding of the initiator tRNA to chk>roplast ribosomal30s subunits. In its nreture form it is a protein 
of about 400 residues whose centra! section is evolutionary related to the sequence of bacterial IF-3 [2]. As a signature 
pattern a highly consented region was selected located in the central section of bacterial IF-3 and of IF-3(chl). 

IS [0894] Consensus pattern: [KRHLIVMl(2HDNHFYHGSNHKR]-[UVMFYS]-x-[FY]-[DEQTH]-x(2)-[KRQ]- 

[ 1] Liveris D.. Schwartz J.J.. Geertman R., Schwartz I. FEMS Mfcrobiol. Lett. 112:211-216(1993). 
[ 2] Lin Q.. Ma L, Burkhart W., Spremulfi LL J. Btol. Chem. 269:9436-9444(1994). 

20 [0895] 301 . Imldazoleglycerol-phosphate dehydratase signatures (IGPD) 

Imidazoleglycerol-phosphate dehydratase (EC 4.2.1.19 ) is the enzyme that catalyzes the seventh step in the biosyn- 
thesis of histidine in bacteria, fungi and plants. In most organisms it Is a monofunctional protein of about 22 to29 Kd, 
In some bacteria such as Escherichia coli it is the C4erminal domain of a bifunctional protein that include a histidinol- 
phosphatase domain [1]. Two signature patterns were developed that each include two consecutive histidine residues. 

2S [0896] Consensus pattern: tLIVMY]-{DEl-x-H-H-x(2)-E-x(2)-[GCA]-[LIVMHSTACHLIVMh 
Consensus pattern: G-x-[DNl-x-H-H-x(2)-E-lSTAGCl-x-(FY]-K - 

[0897] [ 1] Cartomagno M.S., Chiartotti L. Alifano R, Nappo A.G., Bruni C.B. J. Mol. Biol. 203:585-606(1988). 
[0898] 302. lndole-3-glycerol phosphate synthase signature (IGPS) 

Indole-a^lycerol phosphate synthase (EC 4.1.1.48) (IGPS) catalyzes the fourth step In the biosynthesis of tryptc^han: 
30 the ring ck>sure of 1 -(2-carboxy-phenylamino)-1 -deoxyribulose into indol-3-glycerol-phosphate. In some bacteria, IGPS 
is a single chain enzyme. In others - such as Escherichia coli - it is the N-terminal domain of a bifunctk)nal enzyme 
that also catalyzes N-(5'-phosphoribosyl)anthranilate isomerase (PRAI) activity, the third step of tryptophan btosynthe- 
sis. In fungi. IGPS is the central domain of a trifunctk)nal enzyme that also contains a PRAI C-terminal domain and a 
glutamine amidotransferase N-terminal domain. The N-terminal sectkxi of IGPS contains a highly conserved regbn 
35 which X-ray ciystalfography studies [1] have shown to be part of the active site cavity. This regkxi was used as a 
signature pattern for IGPS. 

[0899] Consensus pattern: [LIVMFY]4LIVMC]-x-E-[LIVMFYCJ-K-[KRSPHSTAKJ-S-P-[STl-x(3)-[UV^^ 
[0900] ( 1] Wilmanns M., Priestle J.R. Niermann T, Jansonius J.N. J. Mol. Bk>l. 223:477-507(1992). 
[0901] 303. (IL2) Interleukin 2. 31 members 

40 [0902] 304. (ILVD EDD) Dihydroxynacid and 6-phosphogluconate dehydratases. Two dehydratases have been 
shown [1] to be evolutwnary related: - Dihydroxy-acid dehydratase (EC 4.2.1.9 ) (gene ilvD or ILV3) which catalyzes 
the fourth step In the biosynthesis of isoleucine and valine, the dehydratatbn of 2,3-dihydroxy-isovalek: acid Into alpha- 
ketoisovaleric acid. - 6-phosphogluconate dehydratase (EC 4.2.1.12) (gene edd) which catalyzes the first step in the 
Entner-Doudoroff pathway, the dehydralalion of 6-phospho-D-gluconate into 6-phospho-2-dehydro-3-deoxy-D-gluco- 

45 nate. - Escherichia coli hypothetical protein yjhG. Both enzymes are proteins of about 600 amino acid reskJues. Two 
highly conserved regbns have been devek)ped as signature patterns. The first pattern is kjcated In the N4erminal part 
and contains a cysteine that couW be invoTved in the binding of a 2Fe-2S iron-suH ur cluster [2]. The second pattern is 
kx:ated in the C-terminal half. 

[0903] Consensus pattern: C-D-K-x(2)-P-{GA]-x(3)-[GA] [The C couW be a 2Fe-2S ligand] 
so Consensus pattern: [SA]-L-[LIVM]-T-D-{GAhR-[LIVMF]-S-{GA]-[GAV]-[ST]- 

[0904] ( 1] Egan S.E., Fliege R., Tong S., Shibata A., Wblf R.E. Jr.. Conway T J. Bacteriol. 174:4638-4646(1992). 
[ 2] Velasco J.A., Cansado J., Pena M.C., Kawakami T, Laborda J., Notario V. Gene 137:179-185(1993). 
[0905] 305. IMP dehydrogenase / GMP reductase signature 

IMP dehydrogenase (EC 1.1.1.205) (IMPDH) catalyzes the rate-limiting reactk)n of de novo GTP biosynthesis, the 
ss NAD-dependent reducton of IMP into XMP [IJ.Inhibition of IMP dehydrogenase activity results in the cessation of DNA 
synthesis. As IMP dehydrogenase is associated with cell prollferatkxi. it is a possible target for cancer chenrwtherapy 
Mammalian and bacterial IMPDHs are tetramers of kJentical chains. There are two IMP dehydrogenase isozymes in 
humans {2].GMP reductase (EC 1.6.6.8) catalyzes the irreversible and NADPH-dependent reductive deaminatk)n of 
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GMP into IMP [3]. It converts nucleobase. nucleoside and nucleotide derivatives of G to A nucleotides, and nnalntains 
intracellular balance of A and G nucleotides. IMP dehydrogenase and GMP reductase share many regions of sequence 
similarity. One of these regions is centered on a cysteine residue thought [3] to be involved in binding IMP. This region 
was used as a signature pattem. 
s [0906] Consensus pattern: [UVM]4RKHUVM]-<HUVMl-G-x-G-S-[UVM]-C-x-T [C is the putative IMP-binding resi- 
due]- 

[ 1] Collart RR. Huberman E. J. Biol. Chem. 263:15769-15772(1988). 

[ 2] hJatsumeda Y, Ohno S., Kawasaki H., Konno Y. Weber G., Suzuki K. J. Biol. Chem. 265:5292-5295(1990). 
10 1 3] Andrews S.G., Guest J.R. Biochem. J. 255:35-43(1 988). 

[0907] 306. (IPPc) Inositol polyphosphate phosphatase family, catalytic domain 
[09081 [1] York JD, Ponder JW. Chen ZW. Mathews FS, Majerus PW; 

Bk)chemistry 1994:33:13164-13171. [2] Jefferson AB, Auethavekiat V, Pot DA, Williams LT. Majerus PW; J Biol Chem 
IS 1997;272:5983-5988. [3] Zhang X, Jefferson AB, Auethavekiat V, Majerus PW; Proc Natl Acad Sci U S A 1995;92: 
4853-4856. [4] York JD. Majerus PW. Proc Natl Acad Sci U S A 1990:87:9548-9552. [5] Neuwakl AF, York JD, Majerus 
PW; 

FEBS Lett 1991;294:16-18. 

[0909] 307. to calmodulin-bindtng motif 

20 

[1] Xie X. Harrison DH, Schlichting I, Sweet RM, Kalabokis VN, Szent-Gyorgyi AG, Cohen C; Nature 1994;368: 
306-312. 

[2] Rhoads AR. Frtedberg F; FASEB J 1997;11:331-340. 

2S [0910] 308. Inosine-uridine preferring nucleoside hydrolasefamily signature (lU nuc hydro) 

Inosine-uridine preferring nucleoside hydrolase (EC 3.2.2. 1^ (lU-nucleosidehydrolase or lUNH) Is an enzyme first iden- 
tified tn protozoan [1] that catalyzes the hydrolysis of all of the commonly occuring purine and pyrlmidine nucleosides 
into ribose and the associated base, but has a preference for inoslne and uridine as substrates. This enzyme is Important 
for these parasitk: organisms, which are deficient in de novo synthsis of purines, to salvage the host purine nucleosides. 

30 lUNH from CrithkJia fasciculata has been sequenced and characterized, It Is an honriotetrameric enzyme of subunits 
of 34 Kd. An histkiine has been shown to be important for the catalytic mechanism, it acts a proton donor to activate 
the hypoxanthine leaving group. lUNH is evoluttonary related to a number of uncharacterized proteins from varbus 
biologk:al sources, notably: - Escherichia coii hypothetk:al protein yaaF. - Escherichia coli hypothetical protein ybeK. 

- Escherchia coli hypothetfcal protein yelK, - Fissron yeast hypothetical protein SpAC17G8.02. - Yeast hypothetical 
35 protein YDR400w. - An hypothetical prc4ein from the archaebacteria Desulf urotobus ambivalens. As a signature pattem 

for these proteins, a highly consen/ed reg»n was selected located in the N-terminal extremity. This region contains 
four consen/ed aspartates that have been shown [2] to be located in the active site cavity. 
[091 1] Consensus pattem: D-x-D-[PTHGA]-x-D-D-rrAVh[VlhA - 

40 [ 1] Gopaul D.N., Meyer S.L, Degano M., Sacchettini J.C., Schramm VL Biochemistry 35:5963-5970(1996). 

( 2] Degano M., Gopaul D.N.. Scapin G., Schramm V.L, Sacchettini J.C. Biochemistry 35:5971-5981(1996). 

[0912] 309. (Insulinase) 
Insulinase family, zinc-binding regkxi signature 
45 (aka PeptkJase_M16) 

[0913] A number of proteases dependent on divalent catwns for their activity have been shown (1 ,2] to belong to 
one family, on the basis of sequence similarity. These enzymes are listed bek>w. 

Insulinase (EC 3.4.24.56) (also known as insulysin or insulin-degrading enzyme or IDE), a cytoplasmic enzyme 
so which seems to be involved in the cellular processing of insulin, glucagon and other small polypeptides. 

- Escherichia coli protease III (EC 3.4.24.55) (pitrilysin) (gene ptr). a periplasmic enzyme that degrades small pep- 
tkjes. 

- Mitochondrial processing peptkiase (EC 3.4.24.64) (MPP). This enzyme removes the transit peptide from the pre- 
cursor form of proteins imported from the cytoplasm across the mitochondrial inner membrane. It is composed of 

ss two nontdentu:al homologous subunits termed alpha and beta. The beta subunit seems to be catalytically active 

while the alpha subunit has probably k>st its activity. 

- Nardilysin (EC 3.4.24.61) (N-arginine dibask: converlase or NRD convertase) this mammalian enzyme cleaves 
peptkle sut)strates on the N-terminus of Arg reskJues in dibask: stretches. 
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- Klebsiella pneumoniae protein pqqF. This protein is required for the biosynthesis of the coenzyme pyrrok>quino- 
llne-qulnone (PQQ). It is thought to be protease that cleaves peptide bonds in a small peptide (gene pqqA) thus 
providing the glutamate and tyrosine residues necessary for the synthesis of PQQ. 

Yeast protein AXL1 . which is involved in axial budding [3]. 
s - Eimeria bovis sporozoite developmental protein. 

- Escherichia coB hypothetical protein yddC and HI1 368. the corresponding Haemophilus influenzae protein. 
Bacillus subtilis hypothetical protein ymxG. 

- Caenortiabdftis elegans hypothetical proteins C28F5.4 and F56D2. 1 . 

10 [0914] It should be noted that in addition to the above enzymes, this family also includes the core proteins I and II 
of the mitochondrial bcl connplex (also called cytochrome c reductase or complex III), but the situation as to the activity 
or lack of activity of these subunits Is quite complex: 

- In mammals and yeast, core proteins I and II lack enzymatic activity. 

IS - In Neurospora crassa and in potato core protein I is equivalent to the beta subunit of MPR 

- In Euglena gracilis, core protein I seems to be active, while subunit It Is inactive. 

[0915] These proteins do not share many regions of sequence similarity; the most noticeable is in the N-terminal 
sectk}n. This region includes a consented histidlne foltowed. two residues later by a glutamate and another histidine. 
20 In pitrilysin, it has been shown [4] that this H-x-x-E-H motif is involved in enzyme activity; the two histidines bind zinc 
and the glutamate is necessary for catalytic activity Non active members of this family have lost from one to three of 
these active site resklues. We developed a signature pattem that detect active members of this family as well as some 
inactive members. 

[09iq Consensus pattem G-x(8.9)-G-x4STA}.H-[LIVMFY)-ILIVMC]-[DERNJ-[HRKL]-[LMFAT|-x-[LFSTH]-x- 
2S [GSTAN]-[GST| [The two H are zinc ligands) [E is the active site residue] Sequences known to belong to this class 
detected by the pattem ALL active members as well as ail MPP alpha subunits and core II subunits. Does not detect 
inactive core I subunits. 

[0917] Note: these proteins bekxig to family M16 in the classificatton of peptidases [5]. 

30 1 1] Rawlings N.D., Barrett A. J. Biochem. J. 275:389-391(1991). 

[ 2] Braun H.-P.. Schmitz U.K. Trends Bkx^hem. Scl. 20:171-175(1995). 

( 3] Becker A.B.. Ftoth R.A. Proc. Natl. Acad. Scl. U.S.A. 89:3835-3839(1992). 

[ 4] Fujita A.. Oka G., Arikawa Y. Katagai T, Tonouchi A., Kuhara S., Misumi Y. Nature 372:567-570(1994). 
[ 5] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

3S 

[0918] 310. Involucrtn repeat 

[0919] Eckert RL. Yaffe MB, Crish JR Murthy S, Rorke EA. Welter JF, J Invest Denmatd 1993;100:613-617. 
[0920] 31 1 . Isochorismatase family. This family are hydrolase enzymes. 

[0921] Romao MJ. Turk D, Gomis-Ruth FX, Huber R, Schumacher G. Mollering H, Russmann L. J Mol Biol 1992; 
40 226:1111-1130. 

[0922] 31 2. Inositol monophosphatase fanrtily signatures (inositol_P) 

It has been shown [1J that several proteins share two sequence motifs. Two of these proteins are enzymes of the 
inositol phosphate second messenger signaling pathway: - Vertebrate and plants inositol monophosphatase (EG 
3.1.3.25). - Vertebrate Inositol polyphosphate 1 -phosphatase (EC 3.1. 3.57 ). The function of the other proteins is not 
45 yet clear: - Bacterial protein cysQ. CysQ could help to control the pool of PAPS (3*-phosphoadenoskJe 5 -phosphosul- 
fate), or be useful in sulfite synthesis. - Escherichia coli protein suhB. Mutatbns in suhB results in the enhanced syn- 
thesis of heat shock sigma factor (htpR). - Neurospora crassa protein Qa-X. Probably Involved in quinate metabolism. 

- Emericella nidulans protein qutG. Probably involved In quinate metabolism. - Yeast protein HAL2/MET22 [2] involved 
in salt tolerance as well as methk)nine biosynthesis. - Yeast hypothetk:al hypothetk:al protein YHR046c. - Caenorhab- 

so ditis elegans hypothetcal protein F13G3.5. - A Rhizobium leguminosarum hypothetical protein encoded upstream of 
the pss gene for exopolysaccharide synthesis. - Methanococcus jannaschii hypothetical protein MJ01 09.lt is suggested 
[1] that these proteins way act by enhancing the synthesis or degradation of phosphorylated messenger molecules. 
From the X-ray structure of human inositol nrwnophosphatase [3], it seems that some of the conserved residues are 
involved in binding a metal ion and/or the phosphate group of the substrate. 

55 [0923] Consensus pattem: [FWV]-x(0, 1 )4LI VMhD-P-[UVM]-D-[SG]-[ST>x(2)-[F Y]-x- 
[HKRNSTY] [The first D and the T bind a metal fon]- 

Consensus pattem: [Wy|-D-x-[AC].[GSA]-{GSAPy|-x-[LIVACP]-[LIV]-[LIVAC]-x(3)-[GH]-[GA]- 
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[ 1} NeuwaW A.F., York J.D., Majerus RW. FEBS Lett. 294:16-18(1991). 

[ 2] Glaeser H-U., Thomas D.. Gaxiota R. Montrlchard P.. Surdin-Kerjan Y. Serrano R. EMBO J. 12:3105-3110 
(1993). 

( 3] Bone R., Springer J.R, Atack J.R Proc. Natl. Acad. Sci. U.S.A. 89:10031-10035(1992). 

5 

[0924] 313. Ion transport protein 

[0925] This family contains Sodium, Potassium, Calcium kxi channel This family is 6 transmembrane helk:es rn whk:h 
the last two helices flank a loop which determines ion selectivity. In some sub-families (e.g. Na channels) the domain 
is repeated four times, whereas In others (e.g. K channels) the protein forms as a tetramer in the membrane. A bacterial 
10 structure of the protein Is known for the last two helices but Is not the Pfam family due to it lacking the first four h6lk:es 
[0926] 31 4. Isocitrate and isopropybmalate dehydrogenases signature (isodh) 

Isocitrate dehydrogenase (IDH) [1 ,2] is an inrtportant enzyme of carbohydrate metabolism which catalyzes the oxkiative 
decarboxylation of isocitrate into alpha-ketoglutarate. IDH is either dependent on NAD+ fEC 1.1.1.41 ) or on NADP+ 
(EC 1.1.1.42 ). In eukaryotes there are at least three isozymes of IDH: two are located In the mitochondrial matrix (one 

IS N AD+-dependent, the other NADP+-dependent), while the third one (also NADP+-dependent) is cytoplasmic. In Es- 
cherichia coli the activity of a NADP+-dependent form of the enzyme Is controlled by the phosphorylatkxi of a serine 
residue; the phosphorylated form of IDH is completely Inactivated. 3-isopropylmalate dehydrogenase (EC 1.1.1.85) 
(IMDH) [3,4] catalyzes the third step in the bk>synthesis of leucine in bacteria and fungi, the oxidative decarboxylation 
of 3-isopropylmalate Into 2-oxo-4-methylvalerate. Tartrate dehydrogenase (EC 1.1.1.93 ) [5] catalyzes the reduction of 

20 tartrate to oxaloglycolate. These enzymes are evolutionary related [1,3,4,5]. The best consented region of these en- 
zymes is a glycine-rk^h stretch of residues k)cated in the C-terminal sectk)n. This regbn was used as a signature pattern. 
[0927] Consensus pattern: [NSHUMYTHFYDrM]-G-[DNTl-[IMVYhx-[STGDN]-[DN]-x(2)-[SGAP]-x(3.4)-G-(STG]- 
[LIVMPA]-G-[LIVMn- 

2S [1] Hurley J.H., Thorsness RE.. Ramallngam V., Helmers N.H., Koshland D.E. Jr.. Stroud R.M. Proc. Natl. Acad. 

Scl. U.S.A. 86:8635^639(1989). 

[ 2] Cupp J.R.. McAlister-Henn L J. Bk)l. Chem. 266:22199-22205(1991). 

[ 3] Imada K., Sato M.. Tanaka N., Katsube Y, Matsuura Y, Oshlma T. J. Mol. Btol. 222:725-738(1991). 
[ 4] Zhang T. Koshland D.E. Jr. Protein Sci. 4:84-92(1995). 
30 [ 5] Tipton RA., Beecher B.S. Arch. Bk)chem. Biophys. 313:15-21(1994). 

[0928] 31 5. Jacalin-like lectin domain. 

[0929] Proteins containing this domain are lectins. It is found in 1 to 6 copies in these proteins. The domain is also 
found in the animal prostatk: spermine-binding protein (Swiss:P15501 ). 
3S [0930] [1] Sankaranarayanan R. Sekar K. Banerjee R, Sharma V, Surolia A. Vijayan M; Nat Struct Bk>l 1996;3: 
596-603. 

[0931] 316. KM domain 

[0932] KH motifs probably bind RNA directly. Auto antibodies to Nova, a KH domain protein, cause paraneoplastic 
opsoclonus ataxia. 

40 

[1] Burd CG. DreyfussG. Science 1994;265:615-621. 

[2] Musco G, Stier G. Joseph C, Castiglbne Morelli MA, Nilges M. Gibson TJ, Pastore A. Cell 1996:85:237-245. 
[0933] 317. Kek:h motif 

4S [0934] The kelch motif was initially discovered in Keteh (Swiss:Q04652) . In this protein there are six copies of the 
motif. It has been shown that Swiss:Q04652 is related to Galactose Oxidase [1] for which a structure has been solved 
[2]. The kelch motif forms a beta sheet. Several of these sheets associate to form a beta propeller structure as found 
in neur, 

[0935] [1 1 Bork R Doolittle RF, J Mol Bk>l 1994;236:1277-1282. [2] Ito N, Phillips BE. Stevens C, Ogel ZB, McPherson 
so MJ. Keen, JN, Yadav KD, Knowles RF, Nature 1991;35a.87-90. 

[0936] 318. Soyt>ean trypsin Inhibitor (Kunitz) protease Inhibitors family signature 

[0937] The soybean trypsin Inhibitor (Kunitz) family [1] is one of the numerous families of proteinase inhibitors. It 
comprise plant proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, 
thiol proteinases and aspartk: proteinases as well as some proteins that are probably involved in seed storage. This 
ss family Is currently known to group the following proteins: - Trypsin Inhibitors A, B. C, KTll , and KTI2 from soybean. - 
Trypsin inhibitor DE3 from coral beans (Erythrina sp.). - Trypsin inhibitor DE5 from sandal bead tree. - Trypsin inhibitors 
1 A (WTI-IA). IB (WTI-1B), and 2 (WTI-2) from goa bean. - Trypsin inhibitor from Acacia confusa. - Trypsin inhibitor 
from silk tree. - Chymolrypsin inhibitor 3 (WCI-3) from goa bean. - Cathepsin D inhibitors PDI and NDI from potato [2], 
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which inhibit both cathepsin D (aspartic proteinase) and trypsin. - Alpha-amylase/subtilisin inhibitors from barley and 
wheat - AIbumin-1 (WBA-1) from goa bean seeds [3]. - Miraculin from Richadella dulcrfica [4], a sweet taste protein. 
- Sporamin from sweet potato [5], the major tutjerous root protein. - Thiol proteinase inhibitor PCPI 8.3 (P340) from 
potato tuber [6]. - Wound responsive protein gwin3 from poplar tree [7]. - 21 Kd seed protein from cocoa [8]. All these 
5 proteins contain from 1 70 to 200 amino acid residues and one or twointrachain disulfide bonds. The best consented 
region is found in their N-terminal section and is used as a signature pattem 
[09381 Consensus pattem: [UVMhx-C>x4EDm^4DGHnKHDENQ]-x-[LIVMl-x(5)-Y-x4LIVM] - 

[ 1] Laskowski M., Kato I. Annu. Rev. Biochem. 49:593-626(1980). 
10 1 2] Ritonja A., Krizaj I., Mesko P., Kopltar M., Lucovnik P.. Strukelj B.. Pungercar J.. Buttle D.J., Barrett A.J.. Turk 

V. FEBS Lett. 267:13-15(1990). 

[ 3J Kortt A.A.. Strike RM.. de Jersey J. Eur J. Bkx*iem. 181:403-408(1989). 

[ 4] Theerasilp S., HItotsuya H.. Nakajo S., Nakaja K.. Nakamura Y. Kurihara Y J. Biol. Chem. 264:6655-6659 

(1989). 

IS [ 5] Hattorl T, Yoshida N.. Nakamura K. Plant Mol. Biol. 13:563-572(1989). 

[ 6] Krizaj I.. Drobnc-Kosorok M., Brzin J., Jerala R.. Turk V. FEBS Lett. 333:15-20(1993). 

[ 7] Bradshaw H.D., Holfck J.B.. Parsons TJ.. Clarke H.R.G., Gordon M.P. Plant Mol. Btol. 14:51-59(1989). 

[ 8] Tai H., McHenry L. Fritz P.J.. Furtek D.B. Plant Mol. Bol. 16:913-915(1991). 

20 [0939] 31 9. Beta-ketoacyl synthases active site 

Beta-ketoacyl-ACP synthase (KAS) [1] is the enzynrie that catalyzes the condensatbn of malonyl-ACP with the growing 
fatty acid chain. It ts found as a component of the folkiwing enzymatic systems: - Fatty acki synthetase (FAS), whk:h 
catalyzes the formatbn of kxig-chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH. Bacterial and plant chk>- 
roplast FAS are composed of eight separate subunits which coaespond to different enzymatic activities; beta-ketoacyl 

2S synthase Is one of these polypeptides. Fungal FAS consists of two multifunctional proteins, FAS1 and FAS2; the beta- 
ketoacyl synthase domain is kxated in the C-lerminal section of FAS2. Vertebrate FAS consists of a single multifunc- 
tional chain; the beta-ketoacyl synthase domain Is kx»ted in the N-terminal section [2]. - The multrf uncttonal 6-meth- 
ysalteylic ackJ synthase (MSAS) from Penicillium patulum [3]. This is a muftifunctbnal enzyme involved in the biosyn- 
thesis of a polyketide antlbk>tic and whwh has a KAS domain in Its N-terminal section. - Polyketide antibiotic synthase 

30 enzyme systems. Polyketides are secondary metabolites produced by microorganisms and plants from simple fatty 
acids. KAS is one of the connponents involved in the biosynthesis of the Streptomyces polyketkJe antibiotbs granatacin 
[4]. tetracenomycin C [5] and erythromycin. - Emericella nklulans multifunctional protein Wa. Wa is Involved In the 
biosynthesis of conidial green pigment. Wa is protein of 216 Kd that contains a KAS domain. - Rhizobium nodulation 
proteffi nodE, which probably acts as a beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl 

35 chain. - Yeast mitochondrial protein CEM1. The condensation reaction is a two step process: the acyl component of 
an activated acyl primer is transferred to a cysteine residue of the enzyme and is then condensed with an activated 
makxiyi donor with the concomitant release of carbon dkxxkle. The sequence around the active site cysteine Is well 
consented and can be used as a signature pattem. 

[0940] Consensus pattem: G-x(4HLIVMFAP]-x(2)-[AGC]-C-[STAl(2)-[STAG]-x(3)-[LIVMF] [C is the active site resi- 
40 due] 

[ 1] Kauppinen S., Siggaard-Andersen M.. von Wettstein-Knowles P. Carlsberg Res. Gommun. 53:357-370(1988). 
( 2] Wrtkowski A.. Rangan VS.. Randhawa Z.I.. Anrry CM.. Smith S. Eur. J. Bkx:hem. 198:571 -579(1991). 
[3] Beck J., RipkaS.. Siegner A., Schillz E.. SchweizerE. Eur. J. Biochem. 192:487-498(1990). 
45 [ 4] Bibb M.J.. Biro S., Motamedi H., Collins J.F.. Hutchinson C.R. EMBO J. 8:2727-2736(1989). 

[ 5] Sherman D.H., Malpartida F., Bibb M.J.. Kieser H.M.. Bibb M.J., Hopwood D.A. EMBO J. 8:2717-2725(1989). 

[0941] 320. Kinesin motor domain signature and profile 

Kinesin [1,2,3] is a mfcrotubule-associated force-producing prcrtein that mayplay a role in organelle transport. Kinesin 
so is an oligomers complex composedof two heavy chains and two light chains. The kinesin motor activity isdirected 
toward the mcrotubule's plus end.The heavy chain is composed of three structural domains: a large globular N-terminal 
domain which is responsible for the nnotor activity of kinesin (It isknown to hydrolyze ATP, to bind and move on mtero- 
tubules), a central alpha-helk:al coiled coil domain that mediates the heavy chain dimerizatkxi; and asmall globular C- 
termlnal domain which interacts with other proteins (such asthe kinesin light chains), vesicles and membranous or- 
ss ganelles.A number of proteins have been recently found that contain a domain similarto that of the kinesin 'motor* 
domain (1 ,4^E1]: - Drosophila claret segregational protein (ncd). Ned is required for nornral chromosomal segregatk)n 
in meiosis, in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the mi- 
crotubule's minus end. - Drosophila kinesin-like protein (nod). Nod is required for the distributive chromosome segre- 
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gation of nonexchange chromosomes during meiosis. - Human CENP-E [4]. CENP-E is a protein that associates with 
kinetochores during chromosome congression. relocates to the spindle midzone at anaphase, and is quantitatively 
discarded at the end of the cell division. CENP-E is probably an important motor nrwiecule In chromosome movement 
and/ or spindle elongation. - Human mitotic kinesin-like protein-1 (MKLP-1 ), a motor protein whose activity is directed 

5 toward the microtubule's plus end. - Yeast KAR3 protein, which is essential for yeast nuclear fusion during mating. 
KAR3 may mediate microtubule sliding during nuclear fusion and possibly mitosis. - Yeast CIN8 and KIP1 proteins 
whfch are required for the assembly of the mitotic spindle. Both proteins seem to interact with spindle microtubules to 
produce an outwardly directed force acting upon the poles. - Fissk>n yeast cut7 protein, whk:h is essential for spindle 
body duplicatbn during mitotk: dhriskyi. - Emercella nidulans bimC, which plays an important role in nuclear divison, 

10 - Emercella nidulans kIpA. - Caenorhabditis elegans unc-104, which may be required for the transport of substances 
needed for neuronal cell differentiation. - Caeriorhabditis elegans osm-3. - Xenopus Eg5, whbh may be involved in 
mitosis. - Arabidopsis thaliana KatA. KatB and katC. - Chlamydomonas reinhardlii FLA10/KHP1 and KLP1 . Both pro- 
teins seem to play a role in the rotation or twisting of the microtubules of the flagella. - Caenorhabditis elegans hypo- 
thetfcal protein T09A5.2.The kinesin motor donnain Is located In the N-terminal part of most of theabove proteins, with 

IS the exception of KAR3, klpA, and ncd where it is kxatedin the C-terminal sectbn.The kinesIn motor domain contains 
about 330 amino acids. An ATP-blnding motifof type A is found near position 80 to 90. the C-temninal half of the domainis 
involved in mfcrolubule-binding. The signature pattern for that domain Isderived from a consented decapeptide inside 
the microtubule-blnding part. 

Consensus pattern: [GSAHKRHPSTQVMHLIVMF]-x-ILIVMFHIVC]-D-L-IAH]-G-[SAN]-E 

20 

1 1] Bloom G.S., Endow S.A. Protein Prof. 2:1109-1171(1995). 

[ 2] VaWee R.B., Shpetner H.S. Annu. Rev. Bk)chem. 59:909-932(1990). 

[ 3] Brady S.T. Trends Cell Biol. 5:159-184(1995). 

[ 4] Endow S.A. Trends Bk)chem. Sci. 16:221 -225(1 991 ).[E1 J 

2S 

[0942] 321 . Ribosomal protein LI 5 signature 

Ribosomal protein LI 5 is one of the proteins from the large ribosomal subunit. In Escherichia coll, LI 5 is known to bind 
the 23S rRNA. It betongs to a family off ribosomal proteins whfch, on the basis of sequence similarities [1], groups: - 
Eubacterial LI 5. - Plant chforoplast LI 5 (nuclear-encoded), - Archaebacterial LI 5. - Vertebrate L27a. - Tetrahymena 
30 themK)phila L29. - Fungi L27a (L29, CRP-1 . CYH2).L1 5 is a protein of 1 44 to 1 54 amino^id reskJues. As a signature 
pattern, a consented regkxi was selected In the C-terminal sectbn of these proteins. 
[0943] Consensus pattern: K-IUVhfl(2)-IGASLhx-[GT]-x-[UVMA]-x(2.5)-(LIVM]-x-[LIVMF]-x(3,4)-(LIV^^ 
(2)-A-x(3)-[LIVM]-x(3)-G 

[0944] ( 1] Olaka E., Hashimoto T. MIzuta K., Suzuki K. Protein Seq. Data Anal. 5:301 -31 3(1 993). 

35 [0945] 322. LBP / BPI / CETP family signature 

The foltowing mammalian lipki-binding semm glycoproteins belong to the same family [1,2,3]: - Llpopolysaccharide- 
binding protein (LBP). LBP binds to the lipid A moiety of bacterial lipopolysaccharides (LPS), a glycollpkJ present in 
the outer membrane of all Gram-negative bacteria. The LBP/LPS complex seems to interact with the CD 14 receptor 
and may be responsible for the secretkxn of alpha-TNF. - Bacterickial permeability-increasing protein (BPI). Like LBP, 

40 BPI binds LPS and has a cytotoxic activity on Gram-negative bacteria. - Cholesteryl ester transfer protein (CETP). 
CETP is involved in the transfer of insoluble cholesteryl esters in reverse cholesterol transport. - Phospholipid transfer 
protein (PLTP). May play a key role in extracellular phosphollpki transport and modulatkm of HDL partk:les. These 
proteins are stmcturally related and share many regkxis of sequencesimilaritles. As a signature pattern one of these 
regions was selected, whk*i is kcated in the N^erminal sectkxi of these proteins; a regton whfch could be involved In 

45 the binding to the llpkis [2]. 

Consensus pattem: [PAHGAJ4UVMC]-x(2)-R-[IV]-[ST]-x(3)-L-x(5)-[EQ]-x(4)-[LIVM]4EQK]-x(8)-P 

[ 1] Schumann RR. Leong S.R, Flaggs G.W., Gray PW., Wright S.D.. Mathison J.C., Tobias PS.. Ulevrtch RJ. 
Science 249:1429-1431(1990). 
so 1 2] Gray RW., Flaggs G.. Leong S.R. Gumina R.J.. Weiss J.. Ooi C.E., Elsbach R J. Bk)l. Chem. 264:9505-9509 
(1989). 

[ 3] Day J R., Albers J.J., Lofton-Day C.E.. Gilbert TL, Ching A.RT, Grant RJ.. O'Hara RJ., Marcovlna S.M., 
Adolphson J.L J. Bk>l. Chem. 269:9388-9391(1994). 

ss [0946] 323. LIM domain signature and profile 

Recently [1 ,2] a number of proteins have been found to contain a consented cysteine-rich donnain of about 60 amlno- 
acid reskiues. These proteins are: - Caenorhabditis elegans mec-3; a protein required for the differentiation of the set 
of six touch receptor neurons in this nematode. - Caenorhabditis elegans lin-11 ; a protein required for the asymmetric 
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division of vulval blast cells. - Vertebrate insulin gene enhancer binding protein isH. lsl-1 binds to one of the two cis- 
acting protein-binding domains of the insulin gene. - Vertebrate homeobox proteins lim-1. lim-2 (lim-5) and Iim3. - 
Vertebrate lmx-1, which acts as a transcriptional activator by binding to the FLAT element; a beta-cell-specific tran- 
scriptional enhancer found in the insulin gene. - Mammalian LH-2, a transcriptional regulatory protein involved in the 
control of cell differentiation In developing lymphokJ and neural cell types. - Drosophila protein apterous, required for 
the normal development of the wing and halter imaginal discs. - Vertebrate protein kinases LIMK-1 and LIMK-2. - 
Mammalian rhombotins. Rhombotin 1 (RBTN1 or TTG-1) and rhombolin-2 {RBTN2 or TTG-2) are proteins of about 
160 amino acids whose genes are disrupted by chromosomal translocations In T-cell leukemia. - Mammalian and avian 
cysteine-rch protein (CRP), a 192 amino-ackJ protein of unknown function. Seems to interact with zyxin. - Mammalian 
cysteine-fch intestinal protein (CRIP), a small protein which seems to have a role in zinc absorptbn and may functbn 
as an intracellular zinc transport protein. - Vertebrate paxillin. a cytoskeletal focal adheston protein. - Mouse testin. 
Mouse testin shou W not be confused with rat testin which is a thiol protease honrmlog. - Sunflower pollen specific protein 
SF3, - Chcken zyxin. Zyxin is a tow-abundance adheston plaque protein whfch has been shown to Interact with CRP. 
- Yeast protein LRG1 whfch is involved in spoojlatton [4]. - Yeast rho-type GTPase activating protein RGA1/DBM1. - 
Caenorhabditis elegans homeobox protein ceh-14. - Caenorfiabditis elegans homeobox protein urK;-97. - Yeast hypo- 
thetcal protein YKR090w. - Caenorhabditis elegans hypothettoal proteins C28H8.6.These proteins generally have two 
tandem copies of a domain, called UM (forUn-11 lsl-1 Mec-3) in their N-terminal sectton. Zyxin and paxillin areexcep- 
ttons in that they contains respectively three and four LIM domains attheir C-lermlnal extremity. In apterous, isl-1 , LH- 
2, lin-11, lim-1 to lim-3,tmx-1 and ceh-14 and mec-3 there is a homeobox domain some 50 to 95 amino acids after 
theLIM domains.ln the LIM domain, there are seven conserved cysteine residues and ahistkJine. The arrangement 
foltowed by these consented residues is C-x(2)-C-x(16,23)-H-x(2HCH]-x{2)-C-x(2)-C-x(16.21)-C-x(2,3)-[CHD]. The 
LIM domainbinds two zinc tons [5]. LIM does not bind DNA, rather it seems to act asinterface for protein-protein inter- 
action. A pattem was devetoped that spans the first half of the LIM domain. 

[0947] Consensus pattem: C-x(2>-C-x(1 5,21 )-[FYWH]-H-x(2)-[CHhx(2).C-x(2)-C.x(3)-[LI VMF] [The 5 C's and the H 
bind zinc] 

[ 1] Freyd G., Kim S.K., Hon/itz H.R. Nature 344:876-879(1990). 

( 2] Baltz R., Eviard J.-L, Domon C, Steinmetz A. Plant Cell 4:1465-1466(1992). 

[ 3] Sanchez-Garcia L, Rabbitts TH. Trends Genet. 10:315-320(1994). 

[ 4] Mueller A., Xu G.. Wells R.. Hollenberg CP. Piepersberg W. Nucleic AckJs Res. 22:3151-3154(1994). 

[ 5] Mtohelsen J.W., Schmeichel K.L.. Beckerle M.C., Winge D.R. Proc. Natl. Acad. Sci. U.S.A. 90 4404-4408 

(1993). 

[0948] 324. (LRR) Leucine Rich Repeat 

CAUTION: This Ram may not find all Leucine Rich Repeats In a protein. Leucine Rich Repeats are short sequence 
motifs present in a number of proteins with diverse functions and cellular tocations. These repeats are usually involved 
in protein-protein Interactions. Each Leucine Rch Repeat is composed of a beta-a|pha unit. These units form etongated 
non-globular structures. Leucine Rtoh Repeats are often flanked by cysteine rich domains. Number of members: 301 7 
[1] The leucine-rch repeat: a versatile binding motif. Kobe B, Deisenhofer J; Trends Btochem Sci 1994;19:415-421. 
[2J Crystal stmcture of porcine ribonuclease inhibitor, a protein with leucine-rich repeats. Kobe B, Deisenhofer J; Nature 
1993;366:751-756. 

[0949] 325. Plant lipto transfer protein family signature (LTP) 

[0950] Plant celts contain proteins, called lipid transfer proteins (LTP) [1 ,2,3], whch are able to facilitate the transfer 
of phosphdipkls and other lipidsacroes membranes. These proteins, whose subcellular tocatton is not yet known, couW 
play a major role In membrane biogenesis by conveying phospholipids such as waxes or cutin from their site of bio- 
synthesis to membranes unable to fonm these lipids. Plant LTP's are proteins of about 9 Kd (90 amino acids) which 
contain eight consen/ed cysteine restoues all involved in disulfide bridges, as shown in the foltowing schematk: repre- 
sentatton. 



xCxxxxCxxxxxxCXHxxxxxxxxCxCxxxxxxxxxxxCxxxxxxCxx 1 1 1 1 + 1 | +- 





Xy: conserved cysteine involved in a disulfide bond, 
position of tlie pattem. 
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[0951] Consensus pattern: [LIVMHPA]-x(2)-C-x^UVM]-x^LIV^fl-x^LI\^FY]-x-[UVM]-[ST^x(3H 
[U VM] [The two Cs are involved in disulfide bonds] 

[1] Wirtz K.WA Annu. Rev. Biochem. 60:73-99(1991). 
[2] Arondel V, Kader J.C. Experientia 46:579-585(1990). 

[3] Ohirogge J.B.. Browse J.. Somen/ille C.R. Biochim. Biophys. Acta 1082:1-26(1991). 
[0952] 326. (LAMP) Lysosome^ssociated membrane glycoproteins signatures 

Lysosome-associated membrane glycoproteins (lamp) [1] are integral membrane proteins, specific to lysosomes, and 
whose exact biological function is not yet clear. Structurally, the lamp proteins consist of two internally homologous 
lysosome-luminal domains separated by a proline-rlch hinge region; at the C-terminal extremity there is a transmem- 
brane region followed by a very short cytoplasmic tail. In each of the duplicated domains, there are two consented 
disulfide bonds. This structure is schematically represented in the figure below 

+...„++..._H.^-.._++.....+ |||||||| 

xCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxxxCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxx 
< xHingex xTMxO 

In mammals, there are twoclosely related typesof lamp: lamp-1 andlamp-2. In chicken lamp-1 is known as LEPIOO.The 
macrophage protein CD68 (or macrosialin) [2] is a heavily glycosylatedintegral membrane protein whose structure 
consists of a mucin-like domain followed by a proline-rich hinge; a single lamp-like domain; a transmembrane region 
and a short cytoplasmic tail. Two signature patterns for this family of proteins were developed. The first oneis centered 
on the first consented cysteine of the duplicated domains. The second corresponds to a region that includes the ex- 
tremity the second domain, the totality of the transmembrane region and the cytoplasmk; tail. 
[0953] Consensus pattern: ISTA]-C-[UVMHLIVMF^W|-A-x-[LIVMFYVyn-x(3)-[LIVMFYVV]-x(3 [C is involved in a 
disulfide bond] - 

Consensus pattem: C-x(2)-D-x(3.4)-[LIVM](2)-P-[UVM]-x-[LIVM]^-x(2)-[LIVM]- x-G-[LIVM](2)-x-ILIVM](4)-A-[FY]-x- 
[LIVM)-x(2)4KR]-[RH]- x(1,2)-[STAG](2)-Y-IEQ] [C is involved in a disulfide bond] 

[ 1] Fukuda M. J. Biol. Chem. 266:21327-21330(1991). 

[ 2] Holness C.L, da Silva R.P., Fawcett J., Gordon S., Simnxxis D.L J. Biol. Chem. 268:9661-9666(1993). 
[0954] 327. Lipolytic enzymes "G-D-S-L' family, serine active site 

[0955] Recently [1], a family of lipolytk: enzymes has been characterized. This family currently consist of the following 
proteins: 

- AeroTTKjnas hydrophila llpase/phosphatidyteholine-sterol acyltransferase. 
Xenorhabdus luminescens lipase 1 . 

Vibrio mimk;us arylesterase. 

- Escherichia coli acyl-ooA thioesterase t (gene tesA). 

- Vibrio parahaemolytk»Js thermolablle hemolysin/atypk^l phosphdipase. 

Rabbit phospholipase AdRab-B, an intestinal brush border protein with esterase and phospholipase A/lysophos- 
phollpase activity that couW be involved in the uptake of dietary lipids. AdRab-B contains four repeats of about 
320 amino ackis. 

- Arabidopsis thaliana and Brassic napus anther-specific proline-rich protein APG. 

- A Pseudomonas putkia hypothetical protein in trpE4rpG intergenic regkxi. A serine has been identified a part of 
the active site In the Aeromonas, Vibrio mimrcus and Escherichia coll enzymes. It is located in a consen/ed se- 
quence motif that can be used as a signature pattem for these proteins. 

- Consensus pattem: [LI VMFYAG](4)-G-D-S-(LIVM]-x(1 ,2)-[TAG]-G [S is the active site residue] 

[0956] 328. (Lipoprotein 4) Prokaryotb membrane lipoprotein lipid attachment site In prokaryotes. membrane lipo- 
proteins are synthesized with a precursor signal peptide, which Is cleaved by a specific lipoprotein signal peptkJase 
(signalpeptidase II). The peptdase recognizes a conserved sequence and cuts upstreamof a cysteine residue to which 
a glyceride-fatty acki lipid is attached [1].Some of the proteins known to undergo such processing currently include 
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(forrecent listings see [1,2,3]): - Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). - Escherichia coli 
llpoprotein-28 (gene nlpA). - Escherichia coli lipoprotein-34 (gene nIpB). - Escherichia coli lipoprotein nIpC. - Es- 
cherichia coli lipoprotein nIpD. - Escherichia coli osnriotically inducible lipoprotein B (gene osmB). - Escherichia coli 
osnnotically inducible lipoprotein E (gene osmE). - Escherichia coll peptidoglycan-assoclated lipoprotein (gene pal). - 
Escherich© coli rare lipoproteins A and B (genes rplA and rplB). - Escherichia coli copper homeostasis protein cutF 
(or nIpE). - Escherichia coli plasmids traT proteins. - Escherichia coli Col plasmids lysis proteins. - A number of Bacillus 
beta-lactamases. - Bacillus subtilis periplasmic oligopeptlde-binding protein (gene oppA). - Bon^elia burgdorferi outer 
surface proteins A and B (genes ospA and ospB). - Borrella hermsll variable major protein 21 (gene vmp21) and 7 
(gene vmp7). - Chlamydia trachomatis outer membrane protein 3 (gene ompS). - Fibrobacter succinogenes endoglu- 
canase cel-3. - Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA). - Klebsiella pullu- 
lunase secretion protein pulS. - ^4ycoplasma hyorhinis protein p37. - Mycoplasma hyorhinis variant surface antigens 
A, B. and C (genes v^ABC). - Neisseria outer menrtbrane protein H.8. - Pseudomonas aeruginosa lipopeptlde (gene 
IppL). - Pseudomonas solanacearum endoglucanase egl. - Rhodopseudomonas viridis reaction center cytochrome 
subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella flexneri invasion plasmid proteins mxU and mxIM. - Strep- 
tococcus pneumonee oligopeptide transport protein A (gene amlA). - Treponema pallldium 34 Kd antigen. - Treponema 
pallidium membrane protein A (gene tmpA). - Vibrio han/eyi chitobiase (gene chb). - Yersinia virulence plasmid protein 
yscJ. - Halocyanin from Natrobacterium pharaon Is [4], a membrane associated copper- binding protein. This is the 
first archaebacterial protein known to be modified in such a fashk)n).From the precursor sequences of all these proteins, 
a consensus pattern and a set of rules to identify this type of post-translational modification was derived. 
[0957] Consensus pattern: {DERK}(6)4UVMFWSTAG](2)-[UVMFYSTAGCQ]-[AGS]-C [C Is the lipid attachment 
site] Additional rules: 1 ) The cysteine must be between positions 1 5 and 35 of the sequence in conskieratk>n. 2) There 
must be at least one Lys or one Arg in the first seven positnns of the sequence. 

[ 1] HayashI S., WU H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2] Klein P.. Somor^i RL, Lau PC.K. Protein Eng. 2:15-20(1988). 
[ 3] von Heljne G. Protein Eng. 2:531-534(1989). 

[ 4] Mattar S., Scharf B.. Kent S.B.H.. Rodewald K., Oestertielt D.. Engelhard M. J. Biol. Chem. 269:14939-14945 
(1994). 

[0958] 329. (Lopoprotein 5) Prokaryotic membrane lipoprotein lipid attachment site. In prokaryotes, membrane lipo- 
proteins are synthesized with a precursor signal peptide, which is cleaved by a specific lipoprotein signal peptidase 
(signal peptidase II). The peptidase recognizes a consented sequence and cuts upstream of a cysteine resklue to 
whfch a glyceride-fatty acki lipkJ is attached [1].Some of the proteins known to undergo such processing currently 
include (for recent listings see (1,2,3)): - Major outer membrane lipoprotein (murein-llpoprotelns) (gene Ipp). - Es- 
cherichia coli lipoproteln-28 (gene nlpA). - Escherichia coli lipoprotein-34 (gene nIpB). - Escherichja coli lipoprotein 
nIpC. - Escherichia coli lipoprotein nIpD. - Escherichia coli osmotbally inducible lipoprotein B (gene osmB). - Escherfchia 
coli osmotlcalty inducible lipoprotein E (gene osmE). - Escherkdiia coll peptidoglycan-assoclated lipoprotein (gene pal). 
- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). - Escherrchia coli copper homeostasis protein cutF 
(or nIpE). - Escherichia coli plasmids traT proteins. - Escherichia coli Col plasmids lysis proteins. - A number of Bacillus 
beta-lactamases. - Bacillus subtilis periplasms ollgopeptide-binding protein (gene oppA). - Borrella burgdorferi outer 
surface proteins A ami B (genes ospA and ospB). - Borrelia hermsll variable major protein 21 (gene vmp21) and 7 
(gene vmp7). - Chlamydia trachomatis outer membrane protein 3 (gene omp3). - Fibrobacter succinogenes endoglu- 
canase cel-3. - Haennophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA). - Klebsiella pullu- 
lunase secretkx) protein pulS. - Mycoplasma hyorhinis protein p37. - Mycoplasnna hyorhinis variant surface antigens 
A, B, and C (genes vIp ABC). - fsleisseria outer membrane protein H.8. - Pseudomonas aeruginosa lipopeptlde (gene 
IppL). - Pseudomonas solanaceamm endoglucanase egl. - Rhodopseudomonas viridis reaction center cytochrome 
subunit (gene cytC). - Rckettsia 17 Kd antigen. - Shigella flexneri invasbn plasmid proteins mxiJ and mxIM. - Strep- 
tococcus pneumoniae oligopeptide transport protein A (gene amiA). - Treponema pallidium 34 Kd antigen. - Treponema 
pallkiium membrane protein A (gene tmpA). - Vibrio han/eyl chitobiase (gene chb). - Yersinia virulence plasmid protein 
yscJ. - Hakxryanin from Natrobacterium pharaonis [4], a membrane associated copper- binding protein. This is the first 
archaebacterial protein known to be modified in such a fashon).From the precursor sequences of all these proteins, 
a consensus pattern and a set of rules to Identify this type of post-translational modification have been devetoped. 
[0959] Consensus pattern: {DERK}(6)-IUVMFWSTAG](2)-[UVMFYSTAGCQ]-[AGS]-C [C is the lipid attachment 
site] Additkxial rules: 1) The cysteine must be between positkxis 15 and 35 of the sequence in conskJeratfon. 2) There 
must be at least one Lys or one Arg In the first seven positwns of the sequence. 

[0960] ( 1] Hayashi S., Wu H.C. J. Bbenerg. Bkxnembr. 22:451 -471 (1990).( 2] Klein P, Somorjai R.L, Uu PC.K. 
Protein Eng. 2:1 5-20(1 988).[ 3] von Heljne G. Protein Eng. 2:531 -534(1 989). [4] Mattar S.. Scharf B.. Kent S.B.H.. 
RodewaW K.. Oesterhelt D., Engelhard M. J. BkJi. Chem. 269:14939-14945(1904). 
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[0961] 330. (Lum binding) Riboflavin synthase alpha chain family Lum-binding site signature The following proteins 
have been shown [1 ,2] to be structurally and evolutionary related: - Riboflavin synthase alpha chain (RS-alpha) (gene 
ribC in Escherichia coli, ribB in Bacillus subtilis and Photobacterium leiognathi, RIBS in yeast). This enzyme synthesizes 
riboflavin from two moles of 6,7- dimethyl-8-(1 '-D-ribityOlumazine (Lum), a pteridine-derrvative. - Photobacterium phos- 

5 phoreum lumazine protein (LumP) (gene luxL). LumP }s a protein that modulates the color of the bioluminescence 
emission of bacterial lucif erase. In the presence of LumP. light emission is shifted to higher energy values (shorter 
wavelength). LumP binds non^xyvalentty to 6,7-dimethyW-(1'-D-ribityl) lumazine. - Vibrio fischeri yellow fluorescent 
protein (YFP) (gene luxY), Like LumR YFP modulates light emission but towards a longer wavelength. YFP binds non- 
covalently to FMN. These proteins seem to have evolved from the duplication of a domain of abouti 00 residues. In its 

10 C-tenminal section, this donrrain contains a conserved motif [KRJ-V-N-[LI}-E which has been proposed to be the binding 
site for Lum.RS-alpha which binds two molecules of Lum has two perfect copies of this motif, while LumP which binds 
one molecule of Lum, has a Glu instead of Lys/Arg in the first position of the second copy of the nK>tif . Similarily. YFP. 
which binds to one molecule of FMN, also seems to have a potentially dysfunctional binding site by substitution of Gly 
tor Glu in the last positionof the first copy of the motif. Our signature pattern includes the Lum-binding motif. 

IS [09^ Consensus pattern: [UVMF]-x(5)-G-{STADNQHKREQIYW]-V-N-[UVM]-E 

[ 1] CKane D.J., Woodward B.. Lee J.. Prasher D.C. Proc. Natl. Acad. Sci. U.S.A. 88:1100-1104(1991). 
[ 2] CTKane D.J.. Prasher D.C. Mol. Microbiol. 6:443^9(1992). 

20 [0963] 331 . Lysyl oxidase putative copper-binding region signature 

Lysyl oxidase (LOX) [1] is an extracellular copper-dependent enzyme that catalyzes the oxidative deamination of pep- 
tidyl lysine residues in precursors of various collagens and elastins. The deaminated lysines are then able to form 
aldehyde cross-links. LOX binds a single copper atom which seems to reside within an octahedral coordinatkxi complex 
whfch includes at least three histidine ligands. Fourhistidine reskJues are clustered in a central regbn of the enzyme. 

^ This regkyi is thought to be involved in cooper-binding and is called the 'copper-talon' [1]. This region was used as a 
signature pattern. 

[0964] Consensus pattern: W-E-W-H-S-C-HO-H-Y-H 
[0965] [ 1] Krebs C.J., KraweU S.A. Bk)chim. Bbphys. Acta 1202:7-12(1993). 
[0966] 332. Metalk>-beta-lactamase superfamily (iactamase.B) 
30 [0967] [1]: Neuwald AF, Liu JS, Lipman DJ, Lawrence CE, Nucleic Acids Res 1997;25:1665-1677. [2] Carfi A. Pares 
S, Duee E, Galleni M, Duez C, Frere JM. Dkieberg O. EMBO J 1995;14:4914-4921. 
[0968] 333. L-lactate dehydrogenase active site (Idhi ) 

L-lactate dehydrogenase (EC 1.1.1.27) (LDH) [1] catalyzes the reversible NAD-dependent interconversbn of pymvate 
to L-lactate. In vertebrate muscles and in lactk: acid bacteria it represents the final step in anaerobk: glycolysis. This 

3S tetramerc enzyme is present in prokaryotk: and eukaryotk: organisms. Invertebrates there are three isozymes of LDH: 
the M form (LDH-A), found predominantly in muscle tissues; the H form (LDH-B), found in heart muscle and the X form 
(LDH-C), found only in the spermatozoa of mammals and birds. In birds and crocodilian eye lenses, LDH-B serves as 
a structural protein and is known as epsikyvcrystallin [2].L-2-hydroxyisocaproate dehydrogenase (EC 1 . 1 . 1 -) (L-hk:DH) 
[3] catalyzes the reversible and stereospecifk: interconversksn between 2-ketocart)oxylk: ackte and L-2-hydroxy-car- 

40 boxylk: ackJs. L-hk:DH is evolutbnary related to LDHs. As a signature for LDH's a regkxi was selected that includes 
a consen/ed histkline whch is essential to the catalytic mechanism. 
[0969] Consensus pattem: [U VMA]-G-[EQ1-H-G4DNHST1 (H is the active site reskJue] - 

[ 1] Abad-Zapatero C, Griffith J.R, Sussman J.L. Rossmann M.G. J. Mol. Biol 198:445-467(1987). 
45 [ 2] Hendriks W.. Mulders J.W.M., Bibby M.A. Slingsby C, Btoemendal H., de Jong W.W. Proc. Natl. Acad. Sci. 

U.S.A. 85:7114-7118(1988). 

[ 3] Lerch H.-P.. Frank R.. Collins J. Gene 83:263-270(1989). 

[0970] Matate dehydrogenase active site signature (kih2) 

so Malate dehydrogenase (EC 1.1.1.37) (MDH) [1 ,2] catalyzes the intercon version of malate to oxatoacetate utilizing the 
N AD/NADH cofactor system. The enzyme participates in the citric acid cycle and exists in all aerobic organisms. While 
prokaryotk: organisms contains a single form of MDH, in eukaryotic cells there are two isozymes: one whk:h is located 
in the mitochondrial matrix and the other in the cytoplasm. Fungi and plants also harbor a gtyoxysomal form whfch 
functwns in the glyoxylate pathway. In plants chtoroplast there is an addittonal NADP-dependent fonm of MDH (EC 

^ 1.1 1.82) which is essential for both the universal C3 photosynthesis (Calvin) cycle and the more specializedC4 cycle. 
As a signature pattem for this enzyme a regkxi was chosen that includes two reskJues involved in the catalytic mech- 
anism [3]: an aspartc ackJ which is involved in a proton relay mechanism, and an arginine whk:h binds the substrate. 
[0971] Consensus pattem: [UVMhT-(TRKMN]-L-D-x(2)-R-[STA]-x(3)-[LIVMFY] [D and R are the active site resi- 
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dues]- 

[ 1] McAIister-Henn L Trends Biochem. Sci. 13:178-181(1988). 
1 2] Gietl C. Biochlm. Biophys. Acta 1100:217-234(1992). 
5 i 3] Birktoft J J.. Rhodes G., Banaszak LJ. Biochemlstfy 28:6065-6081(1989). 

[ 4) Cendrin R. Chroboczek J.. Zaccai G.. Eisenberg K. Mevarech M. Bbchemistry 32:4308-4313(1993). 

[0972] 334. Legume lectins signatures 

Leguminous plants synthesize sugar-binding proteins which are called legume lectins [1 ,2]. These lectins are generally 
10 found in the seeds. The exact function of legume lectins is not known but they may be involved in the attachment of 
nitrogen-fixing bacteria to legumes and in the protection against pathogens. Legume lectins bind cak;ium and manga- 
nese (or other transilton metals). Legume lectins are synthesized as precursor proteins of about 230 to 260 amino ackJ 
residues. Some legume lectins are proteolytically processed to produce two chains: beta (which corresponds to the 
N-terminal) and alpha (C-terminal).The lectin concanavalin A (conA) from ^ck bean is exceptional in that the two chains 
IS are transposed and ligated (by formation of a new peptide bond). The N-terminus of mature conA thus corresponds 
to that of the alpha chain and the C-terminus to the beta chain. Two signature pattems specific to legume lectins have 
been devebped: the first is located In the C-terminal section of the beta chain and contair^ a consented aspartic acki 
residue important for the binding of catolum and manganese; the secorKi one is located in the N-terminal of the alpha 
chain. 

20 [0973] Consensus pattern: [U V1-[STAG]-V>(DEQV]-[FLI]-D-[ST] [D binds nrianganese and catelum]- 
Consensus pattern: [LIV]-x-[EDQ]-[FYWKR]-V-x-[LIVF]-G-[LF]-[ST|- 

[ 1] Sharon N.. Lis H. FASEB J. 4:3198-320(1990). 

[ 2] Lis H.. Sharon N. Annu. Rev. Bkxhem. 55:33-37(1986). 

2S 

[0974] 335. CoA-llgases (ligases- CoA) 

[0975] This family includes the CoA ligases Succinyl-CoA synthetase alpha: and beta chains, malate CoA ligase and 
ATP-citrate lyase. Some members of the family utilise ATP others use GTP. 

[0976] [1 ] Wolodko WT, Fraser ME, James MN, Bridger WA, J Biol Chem 1 994;269: 1 0883-1 0890. 
30 [0977] 336. linker histone HI and H5 family 

[0978] Linker histone HI is an essential component of chromatin structure. HI links nucleosomes Into higher order 
structures Histone H1 is replaced by histone H5 in some cell types. 

[0979] [1] Ramakrishnan V. Finch JT, Graziano V, Lee PL. Sweet RM. Nature 1993;362:219-223. 
[0980] 337. Lipocalin signature (lipl) 

3S Proteins whch transport small hydrophobic nrwlecules such as steroids, bilins, retinoids, and lipids share limited regions 
of sequence homotogy aruJ a common tertiary structure architecture [1 to 5]. This is an eight stranded antiparailel beta- 
barrel with a repeated + 1 topotogy enctosing a internal ligand binding site [1.3]. The name 'lipocalin' has been proposed 
[5] for this protein family. Proteins known to belong to this family are listed below (references are only provided for 
recently determined sequences). - Alpha-1 -microglobulin (protein HC), which seems to bind porphyrin. - Alpha-1 -acki 

^ glycoprotein (orosomucoki), which can bind a remarkable array of natural and synthetic compounds [6]. - Aphrodisin 
whk:h. in hamsters, functrans as an aphrodisiac pheromone. - Apolipoprotein D, which probably binds heme-related 
compounds. - Beta-lactoglobulin. a milk protein whose physiologcal function appears to bind retinol. - Complenrtent 
component C8 gamma chain, whk:h seems to bind retinol [7]. - Crustacyanin [8], a protein from lobster carapace, which 
binds astaxanthin, a carotenokl. - Epklidymal-retinoic acid binding protein (E-RABP) [9] involved in sperm nraturatbn. 

45 - Insectacyanin, a rrtoth bilin-binding protein, and a related butterfly bilin- binding protein (BBP). - Late Lactatk>n protein 
(LALP), a milk protein from tammar wallaby [10]. - Neutrophil gelatinase-associated lipocalin (NGAL) (p25) (SV-40 
induced 24p3 protein) [11]. - Odorant-binding protein (OBP), which binds odorants. - Plasma retlnol-binding proteins 
(PRBP). - Human pregnancy-associated endometrial alpha-2 gtobulln. - Probasin (PB). a rat prostatic protein. - Pros- 
taglandin D synthase (EC 5.3.99.2) (GSH-independent PGD synthetase), a lipocalin with enzymatic activity [12]. - 

so Purpurin, a retinal protein whch binds retinol and heparin. - Quiescence specific protein p20K from chicken (embryo 
CH21 protein). - Rodent urinary proteins (alpha-2-microglobulin), which may bind pheromones. - VNSP 1 and 2, putative 
pheronrK)ne transport proteins from mouse vomeronasal organ [13]. - Von Ebner's gland protein (VEGP) [14] (also 
called tear lipocalin), a mammalian protein whk:h may be involved in taste recognitkxi. - A frog olfactory protein, whnh 
may transport odorants. - A protein found In the cerebrospinal flukJ of the toad Bufo Marinus with a supposed f unctkxi 

ss similar to transthyretin in transport across the bkxxl brain barrier [1 5]. - Lizard's epkJkiymal secretory protein IV (LESP 
IV), which cou W transport small hydrophobrc molecules into the epididymal fluid during sperm maturation [1 6). - Prokary- 
otic outer-membrane protein bte [ITJ.The sequences of most members of the family, the core or kemal lipocalins, are 
characterized by three short consented stretches of residues [3,18].Olhers, the outlier lipocalin group, share only one 
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or two of these [3.18J. A signature pattern was built around the first, common to all outlier and kemallipocalins. which 
occurs near the start of the first beta-strand. 

[0981] Consensus pattern: [DENG]-x4DENQGSTARK]-x{0,2)4DENQARKHLIVFYHCP}-G-{C}- W4FYWlJRH]-x- 
[LIVMTAl- 

s Note: rt is suggested, on the basis of similarities of structure, function, and sequence, that this family forms an overall 
superf amily. called the calycins, with the avidin/streptavidin <PDOC0Q499 > and the cytosolic fatty- acid binding proteins 
<PDCX;001B8> families [3.1 9] 

( I] Cowan S.W., Newcomer M.E., Jones TA. Proteins 8:44^1 (1990). 

10 [ 2] Igaraishi M.. Nagata A., Toh H.. Urade H.. Hayaishi N. Proc. Natl. Acad. Scl. U.S.A. 89:5376-5380(1992). 

( 3] Flower D.R.. North A.C.T., Attwood TK. Protein Sci. 2:753-761(1993). 
[ 4] Godovac-Zimmermann J. Trends Biochem. Sci. 13:64-66(1988). 
[ 5] Pen^lz S., Brew K. FASEB J. 1:209-214(1987). 

[61 Kremer J.M.H.. Wilting J., Janssen LH.M. Pharmacol. Rev. 40:1-47(1989). 
IS 1 7] Haefllger J. -A., Peitsch M.C., Jenne D., Tschopp J. Mol. Immunol. 28:123-131(1991). 

[ 8] Keen J.N.. Caceres I.. Eliopoulos E.E.. Zagalsky PR, Findlay J.B.C. Eur. J. Biochera 197:407-417(1991). 
[ 9] Newcomer M.E. Structure 1:7-18(1993). 

[10] Collet C, Joseph R. Biochlm. Biophys, Acta 1167:219-222(1993). 

[II] Kjeldsen L, Johnsen A.M., Sengelov H.. Borregaard N. J. Biol. Chem. 268:10425-10432(1993). 
20 [12] Peitsch M.C., Boguski M.S. Trends Biochem. Sci. 16:363-363(1991). 

[13] Miyawaki A., Matsushita Y.R., Ryo Y., Mikoshiba T. EMBO J. 13:5835-5842(1994). 
[14] Kock K., Ahlers C, Schmale H. Eur. J. Bk)chem. 221:905-916(1994). 

[15] Achen M.G.. Harms PJ.. Thomas T, Rfchardson S.J.. Wettenhall R.E.H.. Schrelber G. J. Biol. Chem. 267: 
23170-23174(1992). 

2S [16] Morel L, Dufarre J.-P, Depeiges A. J. Biol. Chem. 268:10274-10281(1993). 

[17] Bishop R.E.. PenfokJ S.S., Frost L.S.. Holtje J.V.. Weiner J.H. J. Bbl. Chem. 270:23097-23103(1995). 
[18] Flower D R., North A.C.T.. Attwood T.K. Bk)chem. Bk)phys. Res. Commun. 180:69-74(1991). 
[19] Ftower D.R. FEBS Lett. 333:99-102(1993), 

30 [0982] Cytosolk; fatty-acid binding proteins signature (Iip2) 

A number of bw molecular weight proteins which bind fatty acids and other organk: ank>ns are present in the cytosol 
[1,2]. Most of them are structurally related and have probably diverged from a common ancestor. This structure is a 
ten stranded antiparallel beta-barrel, albeit with a wide discontinuity between the fourth and fifth strands, with a repeated 
+ 1 topok)gy enclosing an Intemal ligand binding site [2.7]. Proteins known to belong to this family include: - Six. tissue- 

3S specific, types of fatty acid binding proteins (FABPs) found in liver, intestine, heart, epidermal, adipocyte, brain/retina. 
Heart FABP is also known as marrtmary-derived growth inhibitor (MDGI), a protein that reversibly inhibits proliferation 
of mammary carcinoma cells. Epklermal FABP is also known as psoriasis-associated FABP [3J. - Insect muscle fatty 
acid-binding proteins. - Testis lipki binding protein (TLBP). - Cellular retinol-binding proteins I and II (CRBP). - Cellular 
retinoks ackH>inding protein (CRABP). - Gastrotropin. an ileal protein whch stimulates gastric ackl and pepsinogen 

40 secretin. It seems that gastrotropin binds to bile salts and bilirubins. - Fatty acid binding proteins MFB1 and MFB2 
from the mkigut of the insect Manduca sexta [4].ln acWitbn to the above cytosolic proteins, this family also includes: - 
Myelin P2 protein, whk:h may be a lipid transport protein in Schwann cells. P2 is associated with the lipid bilayer of 
myelin. - Schistosoma mansoni protein Sm14 [5] which seems to be involved in the transport of fatty ackls. - Ascarts 
suum pi 8 a secreted protein that may play a role in sequestering potentially toxk: feitty ackis and their peroxkiatbn 

45 products or that may be involved in the nnaintenance of the impermeable lipid layer of the eggshell. - Hypothetical fatty 
ackJ-binding proteins F40F4.2, F40F4.3, F40F4.4 and ZK742.5 from Caenorhabditis elegans. As a signature pattem 
for these proteins a segment from the N-terminal extremity was use. 

[0983] Consensus pattem: (GSAIVKl-x-(FYW]-x-[LIVMF]-x(4)-[NHG]-[FY]-(DE]-x-[UVMFY]-[LIVMJ-x(2)-[LIV- 
MAKR]- 

so Note: it is suggested, on the basis of similarities of structure, function, and sequence, that this family forms an overall 
superfamily, called the calycins. with the lipocalin <PDOC00187 > and avkJin/streptavkfin <PDOC00499> families [6,7]. 

[ 1] Bemier I.. Jolles P. B»chimie 69:1127-1152(1987). 

[2]>teerkamp J.H., PeetersRA.. Maatman R.G.H.J. Biochlm. Btophys. Acta 1081:1-24(1991). 
ss 1 3] Siegenthaler G.. Hotz R.. Chatelbrd-Gruaz D., Didierjean L, Hellman U.. Saurat J.-H, Bkx:hem. J. 302:363-371 

(1994). 

[ 4] Smith AR, Tsuchida K.. Hanneman E.. Suzuki TC, Wells M A. J. Bbl. Chem. 267:380-384(1992). 
( 5] Moser D., Tendler M.. Griffiths G., Klinkert M.-Q. J. Bkrf. Chem. 266:8447-8454(1991). 
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[ 6] Flower D.R., North A.CT. Attwood TK. Protein Sci. 2:753-761(1993). 
[ 7J Flower D.fl. FEBS Lett. 333:99-102(1993). 

[0984] 338. Lipoxygeriases iron-binding region signatures 

5 Lipoxygenases (EC 1.13.11.-) are a class of iron-containing dioxygenases which catalyzes the hydroperoxidation of 
lipids, containing a cis.cis-1 .4-pentadiene structure. They are comnrxm in plants where they may be involved in a number 
of diverse aspects of plant physiology including growth and development, pest resistance, and senescence or respons- 
es to wourtding [1]. In mammals a number of lipoxygenases iso2ymes are involved in the metabolism of prostaglandins 
and leukotrienes [2]. Sequence data is available for the following lipoxygenases: - Plant lipoxygenases (EC 1.13.11.12 ). 

10 Plants express a variety of cytosolic isozymes as well as what seems [3] to be a chloroplast isozyme. - Mammalian 
arachidonate 5-lipoxygenase (EC 1.13.11.34) . - Mammalian arachidonate 12-lipoxygenase (EC 1.13.11.31) . - Mam- 
malian erythroid cell-specific 1 5-lipoxygenase (EC 1.13.ll .33) .The iron atom in lipoxygenases is bound by four ligands. 
three of which are hislidine residues [4]. Six histldines are consented in all lipoxygenase sequences, five of them are 
found clustered in a stretch of 40 amino acids. This region contains two of the three zinc^igands; the other histidines 

IS have been shown [5] to be important for the activity of lipoxygenases. As signatures for this family of enzymes two 
pattems in the region of the histidine cluster were selected. The first pattern contains the first three consented histidines 
and the second pattern includes the fourth and the fifth. 

Consensus patlem: H-(EQ]-x(3)-H-x-[LM]-[NQRC]-[GST]-H-[LI VMSTAC](3)-E [The second and third H's bind iron> 
[0985] Consensus pattern: tLIVMA]-H-P-(LI VM]-x-[KRQ]-[LI VMF](2)-x-[AP]-H- 

20 

1 1] Vick B.A., Zimmerman D.C. (In) Biochemistry of plants: A comprehensive treatise, Stumpf PK.. Ed., Vol. 9. 
pp.53-90, Academk: Press, New-York, (1987). 

[ 2] Needleman P., Turk J., Jakschik B.A., Morrison A.R., Lefkowith J.B. Annu. Rev. Blochem. 55:69-102(1986). 
[ 3] Peng YL. Shirano Y, Ohta H., Hibino T, Tanaka K., Shibata D. J. Biol. Chem. 269:3755-3761 (1 994). 
2S [ 4] Boyington J.C., Gaffney B.J., Amzel LM. Science 260:1482-1486(1993). 

[ 5] Steczko J., Donoho G.P., Clemens J.C., Dixon J.E., Axelrod B. Bk«hemistry 31:40534057(1992). 

[098S| 339. Fumarate lyases signature (lyase.l ) 

A number of enzymes, belonging to the lyase class, for which fumarate is a substrate have been shown [1 ,2] to share 
30 a short conserved sequence around a methionine which is probably involved in the catalytic activity of this type of 
enzymes. These enzymes are: - Fumarase (EC 4.2. 1.2) (fumarate hydratase), which catalyzes the reversible hydratton 
of fumarate to L-malate. There seem to be 2 classes of fumarases: class I are thermolabile dimeric enzymes (as for 
example: Escherchia coli fumC); class II enzymes are therrTK)stable and tetrameric and are found in prokaryotes (as 
for example: Escherichia coli fumA and fumB) as well as in eukaryotes. The sequence of the two classes of fumarases 
35 are not closely related. - Aspartate ammonia-lyase (EC 4.ai.1 ) (aspartase), which catalyzes the reversible converskxi 
of aspartate to fumarate and annrrKxiia. This reactbn is analogous to that catalyzed by fumarase, except that ammonia 
rather than water is involved in the trans-elimination reactkxi. - Arginosuccinase (EC 4.3.2.1 ) (argininosuccinate lyase), 
whch catalyzes the formation of arginine and fumarate from argininosuccinate, the last step in the bk>synthesis of 
arginine. - Adenytosuccinase (EC 4.3.2.2 ) (adenytosuccinate lyase) [3], whk:h catalyzes the eight step in the de novo 
40 biosynthesis of purines, the formation ci 5*-phosphoribosyl-5-amino-4-imidazolecartx)xamKle and fumarate from 1- 
(5-phosphoribosyl)-4-(N-succino-carboxamide). That enzyme can also catalyzes the formatk>n of fumarate and AMP 
from adenyk>suocinale. - P&eudomonas putida 3-carboxy-cis,cis-muconate cycloisomerase (EC 5.5.1.2) (3-carboxy- 
muconate lactonizing enzyme) (gene pcaB) [4], an enzyme involved in aromatk: acids catabolism 
[0987] Consensus pattern: 6-S-x(2)-M-x(2)-K-x-N- 

1 1] Woods S.A, Shwartzbach S.D., Guest J.R. Biochrm. Biophys. Acta 954:14-26(1988). 
[ 2] Woods S.A.. Miles J.S., Guest J.R. FEMS Microbbl. Lett. 51:181-186(1988). 
[ 3] Zaikin H., Dixon J.E. Prog. Nuclei Acki Res. Mol. Biol. 42:259-287(1992). 

[4] Williams S.E., Woolridge E.M.. Ransom S C., Landro J.A, Babbitt PC. Kozark^i J.W. Bkx^hemistry 31: 
so 9768-9776(1992). 

[0988] 340. MCM family signature and profile 

Proteins shown to be required for the initiation of eukaryotic DNA replicatbn share a highly conserved domain of about 
210 amino-acid residues [1,2,3]. The latter shows some similarities [4] with that of varfous other families of DNA- 
ss dependent ATPases. Eukaryotes seem to possess a family of six proteins that contain this domain. They were first 
identified in yeast where most of them have a direct role in the initiatbn of chrorDOsomal DNA replication by interacting 
directly with autonomously replicating sequences (ARS). They were thus called 'minichromosome maintenance pro- 
teins' with gene symbols prefixed by MCM. These six proteins are: - MCM2, also known as cdc19 (in S.pombe) [El]. 
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- MCM3, also known as DNA polynrierase alpha holoenzyme-associated protein PI , RLF beta subunit or ROA. - MCM4, 
also known as CDC54. cdc21 (in S.pombo) or dpa (In Drosophila). - MCM5, also known as CDC46 or nda4 (In S. 
pombe). - MCM6, also known as mis5 (In S.pombe). - MCM7, also known as CDC47 or Prol'rfera (in A.tha!iana).This 
family is also present in archebacteria. In Methanococcus jannaschiithere are four members: MJ0363, MJ0961 , MJ 1 489 

s and MJECL1 3. The presence of a putative ATP-binding domain implies that these proteins maybe invofved in an ATP- 
consuming step in the initiation of DNA replicatkxi in eukaryotes. As a signature pattern, a perfectly consented regm 
was selected that represents a special version of the B motif found tn ATP-binding proteins. 
[0989] Consensus pattem: G^IVT]^LVAC](2HIVT^D^DEHFLHDNST] 

10 (1] Coxon A., Maundrell K., Kearsey S E. Nucleic Acids Res. 20:5571-5577(1 992). 

[ 2] Hu B.. Burkhart R.. Schulte D.. Musahl C, Knippers R, Nucleic Ackis Res. 21:5289-5293(1993). 

[ 3] Tye B.-K. Trends Cell Bk>l. 4:160-166(1994). 

( 4] Koonin E.V. Nuclec Ackls Res. 21:2541-2547(1993). 

IS [0990] 341 . Macrophage migration inhibitory factor family signature (MIF) 

A protein called macrophage migratkm inhibitory factor (MIF) [1] seems to exert an important role in host inflammatory 
responses. It play a pivotal role in the host response to endotoxic shock and appears to serve as a pituitary "stress" 
hormone that regulates systemic inflammatory responses. MIF is a secreted protein of 115 residues which is not proc- 
essed from a larger precursor D-dopachrome tautomerase [2] is a mammalian cytoplasmk: enzyme involved in melanin 

20 biosynthesis and that tautomerizes [><topachrome with concomitant decarboxylaton to give 5, 6-dihydroxy indole (DHI). 
It is a protein of 117 residues highly related to MIR It must be noted that MIF binds glutathione and has been sakJ to 
be related to glutathnne S-transferases. This assertkvi has been later disproved [3].As a signature pattem for these 
proteins, a consented regkxi was selected kKated in the central sectwn. 
[0991] Consensus pattem: [DE]-P-C-A-x(3)-[LIVM]-x-S-l-G-x-[LI VMJ-G- 

2S 

1 1] Bucala R. Immunol. Lett. 43:23-26(1994). 

[ 2] Odh G., Hindemith A., Rosengren A.-M., Rosengren E.. Rorsman H. Bkx:hem. Biophys. Res. Commun. 197* 
619-624(1993). 

I 3] Pearson W.R. Protein Sci. 3:525-527(1994). 

30 

[0992] 342. MIP family signature 

Recently the sequence of a number of different proteins, that all seem to be transmembrane channel proteins, has 
been found to be highly related Jl to 4].These proteins are listed below. - Marrtmalian major intrinsic protein (MIP). MIP 
is the major component of lens fiber gap junctk)ns. Gap junctkxis mediate direct exchange of bns and small molecule 

3S from one cell to another. - Mammalian aquaporins [5]. These proteins form water-specific channels that provide the 
plasma membranes of red cells and kidney proximal and collecting tubules with high permeability to water, thereby 
permitting water to move in the direction of an osmotc gradient. - Soybean nodulin-26, a major component of the 
peribacteroid membrane induced during rKxiulatbn in legume roots after Rhizobium infection. -Plants tonoplast Intrinsic 
proteins (TIP). There are various isoforms of TIP: alpha (seed), gamma. Rt (root), and Wsi (water-stress induced). 

40 These proteins may allow the diff uson of water, amino ackls and/or peptkJes from the tonoplast interbr to the cytoplasm. 

- Bacterial glycerol facilitator protein (gene gIpF), whk:h facilitates the movement of glycerol across the cytoplasmic 
membrane. - SalnrKxiella typhimurium propanedk>l diffusbn facilitator (gene pduF). - Yeast FPS1, a glycerol uptake/ 
efflux facilitator protein. - Drosophila neurogenk: protein "big brain' (bib). This protein may mediate intercellular com- 
munk»tk3n; it may functkxis by alk>wing the transport of certain molecules(s) and thereby sending a signal for an 

45 exodermal cell to become an epidenmoblast instead of a neuroblast. - Yeast hypothetk^l protein YFL054c. - A hypo- 
thetical protein from the pcpX region of lactococcus lactis. The MIP family proteins seem to contain six transmembrane 
segments. Computer analysis shows that these protein probably arose by a tandem, intragenk: duplicatk)n event from 
an ancestral protein that contained three transmembrane segments. As a signature pattem a well consen/ed region 
was selected whch is boated in a probable cytoplasmk: kx)p between the second and third transmembrane regions. 

so [0993] Consensus pattem: [HNQAhx-N-P-(STA]-{UVMF]-lST]-(LIVMFHGSTAFY]- 

[ 1] Reizer J., Reizer A.. Saier M.H. Jr. CRC Crit Rev. Bkxhem. 28:235-257(1993). 
[ 2] Baker M.E., Saier M.H. Jr. Cell 60:165-166(1 990V 

[ 3] Pao G.M., Wu L-R, Johnson K.D.. Hoefle H.. Chrispeels M.J., Sweet G., Sandal N.N., Saier M.H. Jr. Mol. 
SS Mcrobbl. 5:33-37(1991). 

[ 4] Wistow G.J., Pisano M.M., Chepelinsky A.B. Trends Bkx:hem. Sci. 16:170-171(1991). 
[ 5] Chrispeels M.J., Agre P. Trends Bk«hem. Sci. 19:421-425(1994). 
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[0994] 343. Mandelate racemase / muconate lactonizing enzyme family signatures 

Mandetate racemase (EC 5.1.2.2 ) (MR) and muconate lactonizing enzyme(EC 5.5.1.1 ) (MLE) are two bacterial en- 
zymes involved in aromatic acid catabolism. They catalyze mechanistically distinct reactions yet they are related at 
the level of their primary, quatemary (homooctamer) and tertiary structures [1 .2]. A number of other proteins also seem 

s to be evolutionary related to these two enzymes. These are: - The various plasm Id-encoded chloromuconate cyclois- 
omerases (EC 5.5.1.7) . - Escherichia coli protein rspA [3], rspAseemstobe involved in the degradation of homoserine 
lactone (HSL) or of one of its metabolrte. - Escherichia coli hypothetical protein ycjG. - Escherichia coli hypothetical 
protein yidU. - A hypothetical protein from Streptomyces ambofactens [4J. Two signature pattems have been developed 
for these enzyrr^; both contain conserved acidic residues. 

10 [0995] The second pattern contains an aspartate and a glutamate which are ligands for either a magnesium ion (in 
MR) or a manganese ion (inMLE). 

[0996] Consensus pattem: A-x4SAGCNHSAG]4LIVM]4DEQhx-A4LA]-x-[DEHUA}-x-(GAHKRQ]-x(4)-[PSA]- 
[LIV]-x(2)-L-{LIVMF]-G- 

Consensus pattem: [LIVFl-x(2)-D-x-INH]-x(7)4ACLhx(6)-[LIVMF]-x(7)-[LIVM]- E^DENQ]-P [D and E bind a divalent 
IS metal ion]- 

[ 1] Neidhart D.J., Kenyon G.L., Gertt J.A., Petsko G.A. Nature 347:692-694(1990). 
[ 2J Petsko G.A., Kenyon G.L, Gerit J. A., Ringe D.. Kozarich J.W. Trends Bkx:hem. Sci. 18:372-376(1993). 
[ 3] Huisman G.W.. Kolter R. Science 265:537-539(1994). 
20 1 4] SchneMer D., Aigle B.. Lebkxid R, Simonet J.M., Decaris B. J. Gen. Microbbl. 139:2559-2567(1993). 

[0997] 344. Merozoite Surface Antigen 2 (MSA-2) family 

[0998] Thomas AW, Carr DA, Carter JM, Lyon JA, Mol Bkx:hem Parasitol 1990;43:211-220. 
[0999] 345. MSP (Major sperm protein) domain. 
2S [1000] Major sperm proteins are involved in sperm motility. These proteins oligomerise to form filaments. Partial 
matches to this domain are also found in other non MSP proteins. These include Swiss:P40075 and Swiss:P34593. 
[1001] [1] Bulkxsk TL. Roberts TM. Stewart M. J Mol Biol 1996:263:284-296. [2] King KL. Stewart M, Roberts TM, 
Seavy M, J Cell Sci 1 992;1 01 :847^. 

[1002] 346. (Matrix) Viral matrix protein. Found in Morbilliviais and paramyxovirus, pneumovirus. Number of mem- 
30 bers: 105 

[1003] 347. Omethy transferase (methyttransQ 

[1004] This family includes a range of O-methyttransferases. These enzymes utilise S-adenosyl methkxiine. 
[1 005] [1 ] Keller NR Dischinger HC, Bhatnagar D, Cleveland TE, Ullah AH, AppI Environ MfcrobkJl 1 993;59:479-484. 
[1006] 348. Magnesium chelatase, subunit Chll 
3S [1 007] Magnesium-chelatase is a three-component enzyme that catalyses the Insertion of Mg2+ into protoporphyrin 
IX This is the first unique step in the synthesis of (bacterio)chlorophyII. Due to this, it is thought that Mg-chelatase has 
an important role in channeling inter- mediates into the (bacterk))chtorophyll branch in response to conditksns suitable 
for photosynthetk: growth. Chli and BchD have molecular weight between 38-42 kDa. 

[10081 (11 Walker CJ. Willows RD, Biochem J 1997;327:321-333. [2] Petersen BL, Jensen PE. Gibson LC, Stummann 
40 BM, Hunter CN, Henningsen KW. J Bacterid 1998;180:699-704. 
[1009] 349. PlasmkJ recombination enzyme (Mob_Pre) 

[1010] With some plasmkis, recombination can occur in a site specific manner that is Independent of RecA. In such 
cases, the recombination event requires another protein called Pre. Pre is a plasmki recombination enzyme. This 
protein is: also known as Mob (conjugative mobilizatkxi). 
4S [1011] [1] Priebe SD, Lacks SA. J Bac^eriol 1989;171:4778-4784. 
[1012] 350. Monooxygenase 

[1 01 3] This family includes diverse enzymes that utilise FAD. 

[1014] [1] Gatti DL. Palfey BA. Lah MS, Entsch B. Massey V. Balk>u DP. Ludwig ML. Science 1994;266:110-114. 
[1015] 351. Mov34 family 

so [1016] Members of this family are found in proteasome regulatory subunits, eukaryotk: initiation factor 3 (elF3) sub- 
units and regulators of transcrlptkjn factors. 

[1017] [1] Aravind L, Ponting CP, Protein Sci 1 998;7: 1 250-1254. [2] Hershey JW. Asano K, Naranda T. Vomk)cher 
HP, Hanachi P. Merrick WC, Bk)chimie 1996;78:903-907. 
[1018] 352. Myc amlno4erminal region (Myc_N_term) 
ss [1019] The myc family belongs to the basic helix-loop-helix leucine zipper class of transcripton factors, see HLH. 
Myc forrr^ a heterodimer with Max, and this complex regulates cell growth through direct activation of genes involved 
in cell replk:atkxi [2]. 

[1020] [1] Facchini LM. Penn LZ. FASEB J 1998;12:633-651. [2] Grandori C. Eisenman RN, Trends Bkx^hem Sci 
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1997;22:177-181. 

[1 Q21] 353. (Metallothioj2) Metallothionetn. Members of this family are metallothionelns. These proteins are cysteine 
rich proteins that bind to heavy metals. Members of this family appear to be closest to Class II metallothioneins, seed 
metalthio. Number of members: 55 

[1Q22] [1] Medline: 98267202. Characterization of gene repertoires at mature stage of citrus fruits through random 
sequencing and analysis of redundant metalbthionein-like genes expressed during fruit development. Moriguchi T, 
Kita M, Hisada S. Endo-lnagaki T, Omura M; Gene 1998;211 :221-227. < 
[1Q23] 354. MAGE family 

[1024] The MAGE (melanoma antigen-encoding gene) family are expressed in a wide variety of tumors but not in 
normal cells, with the exception of the male germ cells, placenta, and, possibly, cells of the developing embryo. The 
cellular function of this family is unknown. 

[10251 [1] McCurdy DK, Tai LQ. Nguyen J. Wang Z. Yang HM. Udar N. Naiem F, Concannon P. Gatti RA; Mol Genet 
Metab 1998:63:3-13. 

[102e] 355. Malic enzymes signature. Malic enzymes, or malate oxidoreductases, catalyze the oxidative decartx>x- 
ylatwn of malate into pyruvate important for a wide range of metabolic pathways. There are three related forms of malic 
enzyme [1,2,3]: - NAD-dependent nrjalic enzyme (EC 1.1.1.38) . which uses preferentially NAD and has the ability to 
decarboxylate oxatoacetate (OAA). It is found in bacteria and insects. - NAD-dependent nnalk: enzyme (EC 1.1.1.39 ). 
which uses preferentially NAD and is unable to decarboxylate OAA. It is found in the mitochondrial matrix of plants 
and ^ a heterodimer of highly related subunits. - NADP-dependent malic enzyme (EC 1.1.1.40 ). which has a preference 
for NADP and has the ability to decart>oxylate OAA. This form has been found in fungi, animals and plants. In mammals, 
there are two isozymes: one. mitochondrial and the other, cytosolk:. Plants also have two isozymes: chloroplastic and 
cytosolic. There are two other proteins which are closely stmdurally related to malicenzymes: - Escherichia coll protein 
sfcA, whose function is not yet known but which couM be an NAD or NADP-dependent malic enzyme. - Yeast hypo- 
thetk^al protein YKL029c, a probable malic enzyme. There are three well conserved regions in the enzyme sequences. 
Two of them seem to be involved in binding NAD or NADR The significance of the third one, located in the central part 
of the enzymes, is not yet known. This region has been developed as a signature pattern for these enzymes. 
[1 027] Consensus pattern: F-x-[DV]-D-x(2)-G-T-{GSAJ-x-p V]-x-[LI VMAHGAST|(2)-[U VMF](2)- 
[1028] [ 1] Artus N.N.. Edwards G.E. FEBS Lett. 182:225-233(1 985). [2] Loeber G.. Infante A.A., Maurer-Fogy I.. 
Krystek E., Dworkin M.B. J. Bfol. Chem. 266:3016-3021(1991). [ 3] Long J.J., Wang J.-L, Berry J O. J. Biol Chem 
269:2827-2833(1994). 
[1029] 356. (matrixin) 

Matrixtns cysteine switch (aka peptidase_M10) 

[1030] Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-), also known as matrixins [1] (see 
<PDOC00129>), are zinc-dependent enzymes. They are secreted by cells in an inactive form (zymogen) that differs 
from the mature enzyme by the presence of an N-terminal propeptide. A highly consen/ed octapeptide is found two 
residues downstream of the C-terminal end of the propeptide. This region has been shown to be involved in autoinhi- 
bitkxi of matrixins [2,3]; a cysteine within the octapeptide chelates the active site zinc ion. thus inhibiting the enzyme. 
This region has been called the 'cysteine switch' or 'autoinhibitor region'. 
[1031] A cysteine switch has been found in the f6lk>wing zinc proteases: 

- MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24. 17) (slromelysin-1 ), 

- MMP-7 (EC 3.4.24.23) (matrifysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP-11 (EC 3.4.24.-) (stromelysin-3). 

- MMP-1 2 (EC 3.4.24.65) (macrophage metaltoelastase). 

- MMP-13 (EC 3.4.24.-) (collagenase 3). 

MMP 14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1). 

- MMP-1 5 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 
MMP-16 (EC 3.4.24.-) (menrtbrane-type matrix metalliproteinase 3). 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 

- Chlamydomonas relnhardtii gamete lytic enzyme (GLE) [5]. 

[1 032] Consensus patternP-R-C-[GN]-x-P-[DR]-[Ll VSAPKO] [C chelates the zinc ion] Sequences known to bebng 
to this class detected by the pattern AU., except for cat MMP-7 and mouse MMP-1 1 . 
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[ 1] Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

[ 2] Sanchez-Lopez R. Nicholson R. Gesnel M.C., Matrisian LM., Breathnach R. J. Biol. Chem. 263:1 1892-11899 
(1988). 

[ 3] Park AJ., Matrisian LM.. Kelte A.F.. Pearson R., Yuan Z.. Navre M. J, BioL Chem. 266:1584-1590(1991). 
5 [ 4] Lepage T, Gache C. EMBO J. 9:3003-3012(1990). 

[ 5] Kinoshita T., Fukuzawa H., Shimada T. Saito T.. Matsuda Y. Proc. Natl. Acad Scl. U.S.A. 89:4693-4697(1 992). 

[1033] 357. Vertebrate metalk>thionerns signature (metalthio) 

Metaltothioneins (MT) [1,2,3] are small proteins whk:h bind heavy metals such as zinc, copper, cadmium, nickel, etc., 
10 through clusters of thiolate bonds. MPs occur throughout the animal kingdom and are also found In higher plants, fungi 
and some prokaryoles. On the basis of structural relatfonships MPs have been subdivided into three classes. Class I 
includes mammalian MPs as well as MPs from crustacean and molluscs, but with clearly related primary structure. 
Glass II groups together MPs from vartous species such as sea urchins, fungi, insects and cyanobacteria which display 
none or only very distant correspondence to class I MT's. Class III MTs are atypical polypeptkies containing gamma- 
's glutamyteystelnyl units. Vertebrate class I MPs are proteins of 60 to 68 amino add rescues, 20 of these reskJues are 
cysteines that bind to 7 bivalent metal ions. As a signature pattern a regkxi that spans 1 9 reskiues and whk:h contains 
seven of the metal-binding cysteines was chosen, this regbn is located in the N^erminal sectbn of class-l MPs. 
[1 034] Consensus pattern: C-x-C-(GSTAP]-x(2)-C-x-C-x(2)-C-x-C-x(2)-C-x-K- 



[ 1] Hamer D.H. Annu. Rev. Biochem. 55:913-951(1986). 

[ 2] Kagi J.H.R.. Schaffer A. Biochemistry 27:8509^15(1988). 

[ 3] Binz P.-A. Thesis, 1996. University of 2urk:h. 



[1035] 358. Mitochondrial energy transfer proteins signature (mito_carr) 

^ Different types of substrate carrier proteins involved in energy transfer are found in the inner mitochondrial membrane 
[1 to 5]. These are: - The ADRATP carrier protein (AAC) (ADP/ATP transkxase) which exports ATP into the cytosol 
and imports ADP into the mitochondrial matrix. The sequence of AAC has been obtained from various mammalian, 
plant and fungal species. - The 2-oxoglutarate/lmalate carrier protein (OGCP), which exports 2-oxoglutarate into the 
cytosol and imports malate or other dkarboxylfc acids into the mitochondrial matrix. This protein plays an important 

30 role in several metabolk; processes such as the malate/Jaspartate and the oxoglutarate/isocitrate shuttles. - The phos- 
phate carrier protein, whrch transports phosphate groups from the cytosol into the mitochondrial matrix. - The brown 
fat uncoupling protein (UCP) whrch dissipates oxidative energy into heat by transporting protons from the cytosol into 
the mitochondrial matrix. - The trtcarboxy late transport protein (or citrate transport protein) whk:h is involved in citrate- 
H+/imalate exchange. It is important for the bioenergetics of hepatic cells as it provkJes a carbon source for fatty acid 

35 and sterol biosyntheses, and NAD for the glycolylic pathway - The Grave's disease carrier protein (GDC), a protein 
of unknown functbn recognized by IgG in patients with active Grave's disease. - Yeast mitochondrial proteins MRS3 
and MRS4. The exact function of these proteois is not known. They suppress a mitochondrial splice defect in the first 
intron of the COB gene and may act as carriers, exerting their suppressor activity by modulating solute concentrations 
in the mitochondrion. - Yeast mitochondrial FAD carrier protein (gene FLX1). - Yeast protein ACR1 [6], which seems 

40 essential for acctyl-CoA synthetase activity. - Yeast protein PET8. - Yeast prc^ein PMT - Yeast prc^ein RIM2. - Yeast 
protein YHM1/SHM1. - Yeast protein YMC1. - Yeast protein YMC2. - Yeast hypothetical proteins YBR291c, YEL006w, 
YER053C. YFR045W. YHR002W, and YIL006w. - Caenortiabditis elegans hypothetical protein K11 HS.aTwo other pro- 
teins have been found to betong to this family, yet are not kx:alized in the mitochondrial inner membrane: - Maize 
amybplast Brittle-1 protein. This protein, found in the endosperm of kernels, could play a role in amyloplast membrane 

45 transport. - Candkia boidinii peroxisomal membrane protein PMP47 [7]. PMP47 is an integral membrane protein of the 
peroxisome and it may play a role as a transporter. These proteins all seem to be evolutbnary related. Structurally, 
they consistof three tandem repeats of a domain of approximately one hundred reskiues. Each of these domains 
contains two transmembrane regkxis. As a signature pattern, one of the most consented regions in the repeated domain 
was selected. kx»ted just after the first transmembrane region. 

so [1036] Consensus pattern: P-x-IDE]-x-[UVAT|-{RK]-x-[LRHHLIVMFYHCX5AIVM]- 



[ 1] Klingenberg M. Trends Biochem. Sci. 15:108-112(1990). 

[ 2] Walker J.E. Curr. Opin. Struct. Biol. 2:519-526(1992). 

[ 31 Kuan J., Saier M.H. Jr. CRC Cril. Rev. Bkxrfiem. 28:209-233(1993). 

[ 4] Kuan J., Saier M.H. Jr. Res. Microbk>l. 144:671-672(1993). 

[ 5] Nelson D.R., Lawson J.E., Klingenberg M., Douglas M.G. J. Mol. Bbl. 230:1159-1170(1993). 
[ 6] Palmieri R FEBS Lett. 346:48-54(1994). 

( 7] Jank B.. Habermann B., Schweyen RJ,. Link TA. Trends Bkxhem. Sci. 18:427-428(1993). 



150 



EP 1 033 405 A2 

[1 037] 359. Prokaryotic molybdopterin oxidofeduclases signatures (molybdopterin) 

A number of different prokaryotic Qxidoreductases that require and bind anicrfybdopterin cofactor have been shown 
[1,2.3] to share a number of regions of sequence similarity. These enzymes are: - Escherichia coli respiratory nitrate 
reductase (EC l .7.99.4) . This enzyme complex allows the bacteria to use nitrate as an electron acceptor during anaer- 
obic growth. The enzyme is composed of three different chains: alpha, beta and gamma. The alpha chain (gene narG) 
is the molybdopterin-binding subunit. Escherichia coli encodes for a second, closely related, nitrate reductase complex 
which also contains a molybdopterin-binding alpha chain (gene narZ). - Escherichia coll anaerobic dimethyl sulfoxide 
reductase (DMSO reductase). OMSO reductase is the terminal reductase during anaerobic growth on various sulfoxide 
and N-oxide compounds. DMSO reductase is composed of three chains: A, B and C. The A chain (gene dmsA) binds 
nrwlybctopterin. - Escherichia coli biotin sulfoxide reductases (genes bIsC and bisZ). This enzyme reduces a sponta- 
neous oxidation product of biotin, BDS, back to biotin. It may sen^e as a scavenger, allowing the cell to use biotin 
sulfoxide as a biotin source. - Methanobacterium formicicum formate dehydrogenase (EC 1.2.1.2 ). The alpha chain 
(gene fdhA) of this dimeric enzyme binds a rrK)lybdopterin cofactor. - Escherichia coli formate dehydrogenases -H 
(gene tdhF), -N (gene fdnG) and -O (gene fdoG). These enzymes are responsible for the oxidation of formate to carbon 
dioxide. In additkxi to molyt>dopterin. the alpha (catalytic) subunit also contains an active site, selenocysteine. - V\to- 
llnella succinogenes polysulfide reductase chain. This enzyme is a component of the phosphorylative electron transport 
system with polysulfide as the terminal acceptor. It Is composed of three chains: A, B and C. The A chain (gene psrA) 
binds molybdopterin. - Salmonella typhimurium thk)6ulfate reductase (gene phsA). - Escherichia coli trimethylamlne- 
N-oxide reductase (EC 1.6.6.9 ) (gene torA) [4]. - Nitrate reductase (EC 1.7.99.4 ) from Klebsiella pneumoniae (gene 
nasA), Atealigenes eutrophus, Escherichia coll, Rhodobacter sphaeroides, Thkjsphaera pantotropha (gene napA), and 
Synechococcus PCC 7942 (gene narB).These proteins range from 715 amino ackis (fdhF) to 1246 amino ackls (riarZ) 
insize. Three signature patterns for these enzymes were derived. The first Is based on a conserved regfon In the N- 
terminal sectton and contains two cysteine reskiues perhaps involved in binding the molybdopterin cofactor. It shoukJ 
be noted that this region Is not present in bisC. The second pattern is derived from a conserved region k)cated in the 
central part of these enzymes. 

[10381 Consensus pattern: (STAN]-x-[CH]-x(2,3)-C-[STAG]-[GSTVMF]-x-C-x-[LIVMFYW]-x-[UVMAl-x(3,4)-[DEN- 
QKHT]- 

Consensus pattern: [STAl-x-ISTAC](2)-x(2)-[STA]-D-[LIVMY](2)-L-P-x-[STAC](2)-x(2)-E- 

Consensus pattern: A-x(3)-(GDThl-x-[DNQTK]-x-[DEA]-x-(LIVM]-x-(LIVMC]-x- INS]-x(2)-[GS]-x(5)-A-x-[LI VM]-[ST]- 

[ 1] Wootton J.C., Nkx)lson R.E., Cock J.M.. Walters D.E., Burite J.R, Doyle W.A., Bray R.C. Bkx:him. Biophys. 
Acta 1057:157-185(1991). 

[ 2] Bibus RT, Cole S.T. Anderson W.F., Weiner J.H. Mol. Mwrobiol. 2:785-795(1988). 
[ 3J Trieber C.A.. Rothery R.A.. Weiner J.H. J. Bkil. Chem. 269:7103-7109(1994). 

[ 4] l^ejean V., Lobbl-Nivol C, Lepelletier M., Gfordano G., Chippaux M.. Pascal M.-C. Mol. Mfcrobbl. 11 :1169-1179 
(1994). 

[1039] 360. Bacterial mutT donnain signature 

The bacterial mutT protein is Involved in the GO system [1] responsible for renrwving an oxkJatively damaged form of 
guanine (8-hydroxyguanine or7.8-dihydro-&oxoguanlne) from DNA and the nucleotide pool. 8-oxo-dGTP is Inserted 
opposite to dA and dC residues of tennplate DNA with almost equal efficiency thus leading to A.T to G.C transversions. 
MutT specifically degrades aoxo-dGTP to the monophosphate with the concomitant release of pyrophosphate. MutT 
is a small protein of about 12 to 15 Kd. It has been shown [2.3] that a regbn of about 40 amino ackJ residues, whch 
is found in the N4enminal part of mutT, can also be found in a variety <rf other prokaryolk:, viral, and eukaryotte proteins. 
These proteins are: 

Streptomyces pneumoniae mutX 

- A mutT homotog from plasmid pSAM2 of Streptomyces ambofaciens. 

- Bartonella bactllifonmts invasion protein A (gene irrvA). 
Escherk:hia coli dATP pyrophosphohydrolase. 
Protein D250 from Afrk^an swine fever viruses. 
Proteins D9 and D10 from a variety of poxviruses. 

- Mammalian 7.8-dihydro-8-oxoguanine triphosphatase (EC 3. 1 .6.-) [4]. 

- Mammalian diadenosine 5*,5"-P1 ,P4-tetraphosphate asymmetrical hydrolase (Ap4Aase) (EC 3.6.1.17 ) [5], whfch 
cleaves A-5*-PPPP-5'A to yield AMP and ATP 

- A protein encoded on the antisense RNA of the bask: fibroblast growth factor gene in higher vertebrates. 

- Yeast protein YSA1. 

Escherk:hia coli hypothetk:al protein yfaO. 
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- Escherichia coli hypothetical protein ygdU and HI0901 , the corresponding HaenfX)philus influenzae protein. 

- Escherichia coli hypothetical protein yjaD and HI0432. the corresponding Haemophilus influenzae protein. 
Escherichia coli hypothetical protein yrf E. 

Bacillus subtilis hypothetical protein yqkG. 
5 - Bacillus subtilis hypothetical protein yzgD. 

- Yeast hypothetical protein YGL067w. 

[1040] It is proposed [2] that the conseroed domain could be involved in the active center of a family of pyrophosphate- 
releasing NTPases. As a signature pattern the core region of the domain was selected; it contains four conserved 
10 gtutamate residues. 

[1041] Consensus pattern: G-x(5)-E-x(4)-[STAGCHLIVMAC]-x-R-E-[LIVMFT]-x-E-E- 

[1] Michaels M.L, Miller J.H. J. Bacterlol. 174:6321-6325(1992). 
[2] KoonIn E.V. Nucleic Acids Res. 21:4847-4847(1993). 
IS [3] Mejean V, Salles C. Bullions M.J., Bessman M.J.. Claverys J.-P. Mol. Microbiol. 11:323-330(1994). 

14] Sakumi K.. Furuichi M., Tsuzuki T, Kakuma T, Kawabata S.. Maki H., Sekiguchi M. J. Bk)l. Chem. 268: 
23524-23530(1993). 

[5] Thome N.M.H., Hankin S„ Wilkinson M.C.. Nunez C, Barractough R.. McLennan A.G. Biochem. J. 311 :71 7-721 

(1995). 

20 

[1042] 361 . Myb DNA-binding domain repeat signatures 

The retroviral oncogene v-myb . and its cellular counterpart c-myb. encodenuclear DNA-binding proteins that specifi- 
cally recognize the sequence YAAC(G/T)G [1]. The myb family also includes the fol towing proteins: - Drosophila D- 
myb [2]. - Vertebrate myb-like proteins A-myb and B-myb [3]. - Maize CI protein, a transnacting factor which controls 

2S the expression of genes involved in anthocyanin biosynthesis. - Maize P protein [4], a trans-acting factor which regulates 
the biosynthetic pathway of aflavonokJ-derived pigment in certain floral tissues. - Arabidopsis thaliana protein GL1 [5], 
required for the initiatton of differentiation of leaf hair cells (trkrhomes). - A number of myb/cl-related proteins in maize 
and barley, whose roles are not yet known [4J. - Yeast BAS1 [7], a transcriptional activator for the H1S4 gene. - Yeast 
REB1 [8], whbh recognizes sites within both the enhancer and the promoter of rRNA transcription, as well as upstream 

30 of many genes transcribed by RNA polymerase II. - Fission yeast cdc5, a possible transcription factor whose activity 
is required for cell cycle progresston and growth during G2. - Fission yeast mybl, which regulates tetomere length and 
f unctkxi. - Yeast hypothetk:al protein YMR21 3w.One of the most consen/ed regions in all of these proteins is a domain 
of 160amino ackJs. It consists of three tandem repeats of 51 to 53 amino acids. In myb, this repeat regbn has been 
shown [9] to be involved in DNA-binding. The major part of the first repeat is missing in retroviral v-myb sequences 

3S and in plant myb-related proteins. Yeast REB1 differs from the other proteins in this family in having a single myb-like 
domain. As shown in the foitowing schematk: representatton, two signature patterns for myb-like domains were devel- 
oped; the first is located in the N4ermtnal sectksn, the second spans the C^erminal extremity of the domain. 

40 xxxxjDDLXxWxxxEDxxxxxjutxxxxxxxWxxIxxxxxxR^^ 

. Position patterns. 



45 [1043] Consensus pattern: W-[ST]-x(2)-E-[DEhx(2)-[LIV]- 

Consensus pattern: W-x(2)-[LIHSAG]-x(4,5)-R-x(8)-[YW>x(3)-[LI VM]- 

Note: this pattem detects the three copies of the domain in myb, d-myb, A-myb and B-myb; the second of the two 
complete copies of plant myb-related proteins, and the last two copies of yeast BAS1 

so 1 1] Biednkapp H., Borgmeyer U., Sippel A.E., Klempnauer K.-H. Nature 335:835-837(1988). 

[ 2] Peters C.W B.. Sippel A.E., Vingron M.. Klempnauer K.-H. EMBO J. 6:3085-3090(1987). 

[ 3] Nomura N.. TakahashI M., Matsui M.. Ishii S., Date T. Sasannoto S.. Ishizaki R. Nuclei AckJs Res. 16: 

11075-11090(1988). 

[ 4] GrotewokJ E., Athma R, Peterson T Proc. Natl. Acad. Sci. U.S.A. 88:4587-4591 (1 991 ). 
ss [ 5] Oppenheimer D.G., Hennan RL., Sivakumaran S., Esch J., Marks M.D. Cell 67:483493(1991) . 

[ 6] Marocco A, Wissenbach M., Becker D., Paz-Ares J., Saedler H., Salamini F.. Rohde W. Mol. Gen. Genet. 21 6: 
183-187(1989). 

[ 7] Tice-Bakiwin K., Fink G.R, Amdt K.T Science 246:931-935(1989). 
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1 8] Ju Q.. Morrow B.E., Warner J.R. Mol. Cell. Biol. 10:5226-5234(1990). 
[ 9] Klempriauer K.-H.. Sippel A.E. EMBO J. 6:2719-2725(1987). 

[1 044] 362. NADKteperKJerrt gIyc8ro»-3-phosphate dehydrogenase signature 
5 NAD-dependent glycerol-3-phosphate dehydrogenase (EC 1.1.1.8 ) (GPD) catalyzes the reversible reduction of dihy- 
droxyacetone phosphate to gtycerol-3- phosphate. It is a eukaryotic cytosolic honrxxlimeric protein of about 40 Kd. As 
a signature pattern a glycine-rich region that is probably [1] involved in NAD-blnding was selected. 
[10451 Consensus pattern: G^ATHUVM^K-IDN]^UVMJ(2)-A-x-(GA]-x-G-[LIVMF]-x- [DE]-G-[LIVM]-x-[LIVMFYWh 
G-x-N- 

10 [10461 1 1] Cmo J.. Argos R. Rossmann M.G. Eur. J. Biochem. 109:325-330(1980). 
[1047] 363. Nucleosome assembly protein (NAP) 

[1048] It is thought that NAPs may be involved in regulating gene expression as a result of histone accessibility [1 ]. 

[1] Ftodriguez R Munroe D, Prawitt D. Chu LL, Brie E. Kim J, Reid LH, Davies C, Nakagama H, Loebbert R. 
IS Winterpacht A. Petruzzi MJ. Higgins MJ. Nowak N, Evans G, Shows T. Weissman BE, Zabel B, Housman DE, 

Pelletier J, Genomics 1997;44:253-265. 

[2] Schnieders F. Dork T, Amemann J. Vbgel T, Werner M. Schmidtke J; Hum Mol Genet 1 996;5: 1 801 -1 807. 

[1049] 364. NB-ARC dbmain 
20 van der Biezen EA. Jones JD, Curr Biol 1998;8:226-227. 

[1050] 365. Nucleoside diphosphate kinases active site 

[1051] Nucleoskie diphosphate kinases (EC 2.7.4.6) (NDK) [1] are enzymes required for the synthesis off nucleoskle 
triphosphates (NTP) other than ATP. They provkie NTPs for nucleic acki synthesis, CTP for lipid synthesis, UTP for 
polysaccharkie synthesis and GTP for protein elongatfon, signal transductk>n and microtubule polymerization. In eu- 

25 karyotes. there seems to be a small family of NDK isozymes each of which acts in a different subcellular compartment 
and/or has a distinct bk)k)gical functkxi. Eukaryotk; NOK isozymes are hexamers of two highly related chains (Aand 
B) [2]. By random associatkxi (A6, A5B. . . AB5, B6), these two kinds of chain form isoenzymes differing in their isoelectric 
point. NDK are proteins of 17 Kd that act via a ping-pong mechanism in which a histkline reskJue is phosphorylated, 
by transfer of the terminal phosphate group from ATP In the presence of magnesium, the phosphoenzyme can transfer 

30 Its phosphate group to any NDR to produce an NTP. NDK isozymes have been sequenced from prokaryotic and eu- 
karyotic sources. It has also been shown [3] that the Drosophila awd (abnormal wing discs) protein, is a microtubule- 
associated NDK. Mammalian NDK is also known as metastasis inhibition factor nm23.The sequence of NDK has been 
highly consented through evolutkxi. There is a single histkJine reskJue consented in all known NDK isozymes, which 
is involved in the catalytk; mechanism [2). Our signature pattern contains this resklue. 

3S [1052] Consensus pattern: N-x(2)-H-[GA]-S-D-(SAHUVMPKNE] [H is the putative active site residue]- 

[ 1] Parks R., Aganwal R. (In) The Enzymes (3rd edition) 8:X7-334(1973). 

[ 2] Gilles A.-M.. Presecan E.. Vonica A., Lascu I. J, Bk>l. Chem. 266:8784^789(1991). 

[ 3] Biggs J., Hersperger E., Sleeg RS., Lk)tta LA., Sheam A Cell 63:933-940(1990). 

40 

[1053] 366. Nitrite and sulfite reductases iron-sulfur/siroheme-binding site (NIR_SIR) Nitrite reductases (NiR) [1] 
catalyze the reductwn of nitrite into anrvnonium, the second step in the assimilation of nitrate. There are two types of 
NiR: the higher plant chtoroplastk: form of NiR (EC 1.7.7.1) is a mononr»ric protein that uses reduced ferredoxin as 
the electron donor; whBe fungal and bacterial NiR (EC 1.6.6.4 ) are homodimeric proteins that uses NAD(P)H as the 

45 electron donor. Both forms of NiR contain a siroheme-Fe and iron-sulfur centers. Sulfite reductase (NADPH) (EC 
1-8.1.2 ) (SIR) [2] is the bacterial enzyme that catalyzes the reduction of sulfite to sulfide. SIR is an oligomerk: enzyme 
with a subunit compositk)n of alpha(8)-beta(4), the alpha component is a flavoprotein (SIR-FP), while the beta com- 
ponent is a sirohenne. iron-sulfurprotein (SIR-HP).Sulfite reductase (ferredoxin) (EC 1.8.7.1) [3] is a cyanobacterial 
and plant monomeric enzyme that also catalyzes the reduction of sulfite to sulfide. Anaerobic sulfite reductase (EC 

so 1 .8.1 .-) (ASR) [4], a bacterial enzyme that catalyzes the NADH^Jependent reduction of sulfite to sulfide. ASR is an 
oligoHDeric enzyme composed of three different subunits. The C component (geneasrC) seems to be a siroheme. iron- 
sulfur protein. These enzymes share a regk>n of sequence similarity in their C-terminal half; this region which spans 
about 80 amino acuis includes four conserved cysteine resklues. Two of the Cys are grouped together at the beginning 
of the domain, and the two others are grouped in the mkldle of the domain. The cysteines are involved in the binding 

ss of the iron-sulfur center; the last one also binds the siroheme group [2]. A signature pattem from the regk>n around the 
second cluster of cysteines was derived. 

[1 054] Consensus pattern: [STyi-G-C-x(3)-C-x(6HDE]-(U VMF]-[G AT]-(LI VMF] [The two C's are ison-sulf ur ligands]- 
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[ 1] Campbell W.H., Kinghorn J.R Trends Bkxhem. Sci. 15:315-319(1990). 
[ 2] Crane B.R.. Sleget LM., Getzoft E.D. Science 270:59-67(1 995). 

[ 3] Gisselniann G.. Klausnneier R, Schwann J.D. Biochim. Biophys. Ada 1144:102-106(1993). 
[4] Huang C.J., Barrett E.L J. Bacterbl. 173:1544-1553(1991). 

5 

[1055] 367. (NMT) MyristoyhCoA: protein ^4-myristoy^transfe^ase signatures. h4yristoyl-CoA: 
protein N-myristoyttransf erase (EC 2.3.1.97) (Nmt) (1 ] s the enzyme responsible for transf en'ing a myristate group on 
the N-tenminal glycine of a number of cellular eukaryotic and viral proteins. Nmt is a monomeric protein of about 50 to 
60 Kd whose sequence appears to be well conserved. Two highly consented regions have been developed as signature 
10 patterns. The first one is located In the central section, the second in the C4erminal part. 
[10561 Consensus pattern: E-l-N-F-L-C-x-H-K- 
Consensus pattem: K-F-G-x-G-D-6- 

[10571 [ 1] Rudhick D.A., McWherter C.A., Gokel G.W., Gordon J.I. Adv. Enzymol. 67:375-430(1993). 
[1058] 368. ADP-glucose pyrophosphorylase signatures (NTP_transf erase) 

IS [1059] ADP-glucose pyrophosphorylase (glucose-1 -phosphate adenylyltransferase) [1 ,2](EC 2.7.7.27) catalyzes a 
very important step in the biosynthesis of alpha 1.4-glucans (glycogen or starch) in bacteria and plants: synthesis of 
the activated glucosyl donor, ADP-glucose, from glucose-1 -phosphate and ATP. ADP-glucose pyrophosphorylase is a 
tetrameric allostencally regulated enzyme. It is a honnotetramer In bacteria while in plant chloroplasts and amyloplasts, 
it is a heterotetramer of two different, yet evolutionary related, subunits. There are a number of conserved regions in 

20 the sequence of bacterial and plant ADP-glucose pyrophosphorylase subunits. Three of these regions were selected 
as signature pattems. The first two are N-terminal and have been proposed to be part of the allosteric arKi/or substrate- 
binding sites in the Escherichia coli enzyme (gene gIgC). The third pattem corresponds to a consenred region in the 
central part of the enzymes. 

[1 060] Consensus pattem: (AG]-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LV]- 
2S Consensus pattem: W-IFYl-x-G-[ST]-A-IDNSH]-{AS]-[LIVMFYW]- 

Consensus pattem: [APVHGS]-M-G-ILIVMN]-Y-[IVCJ-[LIVMFY]-x(2)-[DENPHKh 

[ 1] Nakata RA, Greene TW., Anderson J.M.. Smith-White B. J., Okita T W., Preiss J. Plant Mol. Bfol. 17:1089-1093 
(1991). 

30 1 2] Preiss J., Ball K.. Hutney J.. Smith-White B.J., U. L. Okltsa TW. Pure Appl. Chem. 63:535-544(1991). 

[1 061] 369. Sodium/hydrogen exchanger family 

[1062] Na/H antiporters are key transporters in maintaining the pH of actively metabolizing cells. The molecular 
mechanisms of antiport are urK;lear. 
3S These antiporters contain 10-12 transmembrane regkxis (M) at the amino-termlnus and a large cytoplasmic region at 
the carboxyl terminus. The transnr»mbrane regions M3-M12 share identity with other members of the family The MS 
and M7 regk)ns are highly consen/ed. Thus, this is thought to be the regkxi that is involved in the transport of sodium 
and hydrogen k)ns. The cytoplasmk: region has little similarity throughout the family 

[1063] [1] Dibrov R Ftiegel L; FEBS Lett 1998:424:1-5. [2] Orlowski J, Grinstein S; J Biol Chem 1997;272: 
40 22373-22376.[3] Humana M, Petrecca K, Lake N, Ortowski J; J Biol Chem 1998;273:6951-6959. 
[1064] 370. Sodium:sutfate symporter family signature (r^ia_sulph_symp) 

Integral membrane proteins that mediate the Intake of a wkie variety of molecules with the concomitant uptake of 
sodium k)ns (sodium symporters) canbe grouped, on the basis of sequence and f unctkxial similarities into a number 
of distinct families. One of these families currently consists of the folbwing proteins: - Mammalian sodium/sulfate 

4S cotransporter [1]. - Manrmiafian renal sodium/dicarboxylate cotransporter [2], which transports succinate and citrate. - 
Mammalian intestinal sodium/dicarboxylate cotransporter. - Chlamydomonas reinhardtil putative sulfur deprivatbn re- 
sponse regulator SAC1 [3]. - Caenorhabditis elegans hypothetical proteins B0285.6, F31 F6.6, K08E5.2 and R107.1 . 
- Escherichia coli hypothetical protein yfbS, - Haemophilus influenzae hypothetical protein HI0608. - Synechocystis 
strain PCC 6803 hypothetbal protein sll0640. - Methanococcus jannaschli hypothetk:al protein MJ0672,These trans- 

so porters are protelrts of from 430 to 620 amino acids which are highly hydrophobic and which probably contain about 
1 2 transmenribrane regwns. As a signature pattem, a consented regkxi was selected whk:h is kx:ated in or near the 
penultimate transmembrane region. 

[1065] Consensus pattem: [STACP)-S-x(2)-F-x(2)-P-lLIVM]-(GSA]-x(3)-N-x-[LIVM]-V- 

[1] Markovich D., Forgo J., Stange G., Biber J., Murer H. Proc. Natl. Acad. Sci. U.S.A. 90:8073-8077(1993), 

[ 2] Pajor A.M. Am. J. Physiol. 270:642-648(1996). 

[ 3] Davles J.R. YiMiz FH., Grossman A. EMBO J. 15:2150-2159(1996). 
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[1 066] 371 . Nif U-like domain 

[1 067] This is an alignment of the carboxy-temninal domain. This is the only common region between the Nif U protein 
from nitrogen-fixing bacteria and rhodobacterial species. The biochemical function of NifU is unknown [1]. 
[1 068] Ouzounis C, Bork R Sander C, Trends Biochem Sci 1 994; 1 9: 1 99-200. 

5 [1 069] 372. Nftrilases / cyankte hydratase signatures 

Nitrilases (EC 3.5.5.1) are enzymes that convert nitrites into their corresponding ackis and anrvnonia. They are wkie- 
spread in microbes as well as in plants where they convert tndole-3-acetonitrile to the hormone indole-3-acetic acid. 
A consented cysteine has been shown [1 ,2] to be essential for enzyme activity; it seems to be involved in a nucleophilc 
attack on the nitrile cartxxi atom. Cyanide hydratase (EC 4.2.1.66 ) converts HCN to fomiamide. In phytopathogenb 

10 fungi, it is used to avokJ the toxic effect of cyanide released by wounded plants [3]. The sequence of cyanide hydrolase 
is evotutbnary related to that of nitrilases. Yeast hypothetk:al proteins YIL164C and YIL165c also bebng to this family. 
As signature patterns for these enzymes, two conserved regions were selected. The first is kx:ated in the N4ermlnal 
sectk)n white the secortd, which contains the active site cysteine, is located in the central section. 
[10701 Consensus pattem: G-x(2HUVMFY](2)-x4IFJ-x-E-x(2)-[LIVMhx-G-Y-P- 

is Consensus pattern: G4GAQ]-x(2)-G-IWAhE4NHhx(2HPSTHLIVMFYSJ-x-[KR] [C is the active site reskJueJ- 

[ 1] Kobayashi M., Izui H., Nagasawa T, Yamada H. Proc. Natl. Acad. Sci. U.S.A. 90:247-251 (19g3). 

[ 2] Kobayashi M., Komeda H., Yanaka N., ftegasawa T, Yamada H. J. BioL Chem. 267:20746-20751(1992). 

[ 3] Wang R, Vanetten H.D. Bkx?hem. Bwphys. Res. Commun. 187:1048-1054(1992). 

20 

[1071] 373. NusB family 

[1072] The NusB protein is involved in the regulatbn of rRNA bk)6ynthesls by transcriptkmal antiterminatbn. 
[10731 Huenges M, Holz C, Gschwind R, Peterandert R. Berglechner R Richter G. Bacher A, Kessler H.Gemmecker 
G. EMBO J 1998;17:4092-4100. 

2S [1074] 374. (Neur Chan) Neurotransmitter-gated k)n-channels signature 

Neurotransmltter-gated ion-channels [1.2,3,4] provkie the molecular basis for rapid signal transmissk>n at chemical 
synapses. They are post-synaptkx>ligorrteric transmembrane complexes that transiently form a ionk: channel upon the 
binding of a specifk: neurotransmitter. Presently, the sequence of subunits from five types of neurotransmitter-gated 
receptors are known: -The nicotink: acetyk^holbie receptor (AchR), an excitatory cation channel. In the nrKJtor endplates 

30 of vert^rates, it is composed of four different subunits (alpha, beta, gamma and delta or epsiton) with a nrwlar stofchi- 
ometry of 2: 1 : 1 : 1 . In neurones, the AchR receptor is composed of two different types of subunits: alpha and non-alpha 
(also called beta). Nkx>tinic AchRs are also found in invertebrates. - The glycnne receptor, an inhibitory chlorkle \on 
channel. The glycine receptor is a pentamer composed of two different subunits (alpha and beta). - The gamma- 
aminobutyrk:-acid (GABA) receptor, which is also an Inhtoitory chlorWe ion channel. The quaternary structure of the 

35 GABA receptor is complex; at least four classes of subunits are known to exist (alpha, beta, gamma, and delta) and 
there are many variants in each class (for example: six variants of the alpha class have already been sequenced). - 
The serotonin 5HT3 receptor. Serotonin is a bk)genk: hormone that f unctkans as a neurotransmitter, a hormone and a 
mitogen. There are seven major groups of serotonin receptors; six of these groups (5HT1 . 5HT2, and 5HT4 to 5HT7) 
transduce extracellular signal by activating G proteins, while 5HT3 is a ligand-gated cation-specific ion channel whk:h, 

40 when activated causes fast, depolarizing responses in neurons. - The glutamate receptor, an excitatory caton channel. 
Glutamate is the main excitatory neurotransmitter in the brain. At least three different types of glutamate receptors 
have been described and are named according to their selective agonists (kainate, Nninethyl-D-aspartate (NMDA) and 
quisqualate). All known sequences of subunits from neurotransmitter-gated ion-channels are structurally related. They 
are composed of a large extracellular glycosylated N-terminal ligand-binding domain, folbwed by three hydrophobk; 

45 transmembrane regksns whbh form the ionic channel, followed by an intracellular regwn of variable length. A fourth 
hydrophobk: regkxi is found at the C-terminal of the sequence. The sequence of subunits from the AchR, GABA, 5HT3, 
and Gly receptors are clearly evolutionary related and share many regions of sequence similarities. These sequence 
similarities are either absent or very weak in the Glu receptors. In the N-temiinal extracellular domain of AchR/GABA/ 
5HT3/Giy receptors, there are two oonsenred cysteine residues. whk:h, in AchR. have been shown to form a disulfide 

50 bond essential to the tertiary structure of the receptor. A number of amino acids between the two disulfide-bonded 
cysteines are also conserved. Therefore this regkxi was used as a signature pattem for this subclass of proteins. 
[1075] Consensus pattem: C-x-(UVMFQhx-[UVMF]-x(2)-IFY]-P-x-D-x(3)-C [The two C's are linked by a disulfkJe 
bond)- 

ss [ 1] Stroud R.M.. McCarthy M R, Shuster M. Bkx*)emistry 29:11009-11023(1990). 

[ 2] Betz H. Neuron 5:383-392(1990). 

[ 3] Dingledfne R., Myers S.J.. Ncholas R.A. FASEB J. 4:2632-2645(1990). 
[ 4] Bamard E.A. Trends Biochem. Sci. 17:368-374(1992). 
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[1076] 375. Orotidine 5*<phosphate decarboxylase active srte 

Orotidine 5'-phosphate decarboxylase (EC 4.1.1.23 ^ (OMPdecase) [1.2] catalyzes the last step in the de novo biosyn- 
thesis of pyrimidines, the decarboxylatfon of OMR into UMP. In higher eukaryotes OMPdecase is part, with orotate- 
phosphoribosyltransferase, of a bitunctional enzyme, while the prokaryotic and fungal OMPdecases are monof unctional 
s protein. Some parts of the sequence of OMPdecase are well consented across species. The best consen/ed region is 
located in the N-terminal half of OMPdecases and is centered around a lysine residue which is essential for the catalytic 
function of the enzyme. This region has been developed as a signature pattern. 

[1077] Consensus pattern: [UVMFTAHUVMF}-x-D-x-K-x(2)-D-HGPJ-x-T-[UVMTAI [K is the active site residue]- 

10 [ 1] Jacquet M.. Guilbaud R., Garreau H. Moi. Gen. Genet. 211:441-445(1988). 

[ 2] Kimsey H.H.. Kaiser D. J. Biot. Chem. 267:81^24(1992). 

[1 078] 376. ATP synthase delta (OSCP) subunit signature 

ATP synthase (proton4ranslocating ATPase) (EC 3.6.1.34) [1,2] Is a component of the cytoplasmic membrane of eu- 
bacteria, the inner mennbrane of mitochondria, and the thylakoid membrane of chloroplasts. The ATPase complex \& 
composed of an oligomeric transmembrane sector, called CF(0). which acts as a proton channel, and a catalytic core, 
termed coupling factor CF(1). 

One of the subunits of the ATPase complex, known as subunit delta in bacteria and chloroplasts or the Oligomycin 
Sensitivity Conferral Protein (OSCP) in mitochondria, seems to be part of the stalk that links CF(0) to CF(1). It either 
transmits conformatonal changes from CF(0) into CF(1 ) or is involved in proton conduction [3]. 
The different delta/OSCP subunits are proteins of approximately 200 amino-ackl residues - once the transit peptide 
has been removed in the chbroptast and mitochondrial forms - which show only moderate sequence homotogy 
The signature pattern used to detect ATPase delta/OSCP subunits is based on a conserved regkvi in the C4erminal 
sectbn of these proteins. 

[1079] Consensus pattern: [LIVM]-x-[UVMFY7>x(3)4LIVMT]-[DENQK]-x(2)-[LIVM]-x-[GSA]<3-lUVMFYGA]-^^ 
[LIVM]4KRHEf«a]-x-[GSEN] 

[ 1] Futai M.. NoumI T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senbr A.E. Physk>l. Rev. 68:177-231(1988). 

[ 3] Engelbrecht S., Junge W. Biochim. Biophys. Acta 1015:379-390(1990). 
[1 080] 377. Aspartate and ornithine carbamoyltransf erases signature 

Aspartate carbamoyltransf erase (EC 2.1.3.2) (ATCase) catalyzes the conversion of aspartate and carbamoyl phos- 
phate tocarbamoylaspartate, the second step in the de novo biosynthesis of pyrimldine nucleotides [1]. In prokaryotes 
3S ATCase consists of two subunits: a catalytic chain (gene pyrB) and a regulatory chain (gene pyri), while in eukaryotes 
it is a domain in a multi-functional enzyme (called URA2 in yeast, rudimentary in Drosophila. and CAD in mammals 
[2]) that also catalyzes other steps of the biosynthesis of pyrimkiines. 

Ornithine carbamoyltransferase (EC 2. 1 .3.3) (OTCase) catalyzes the converston of omithine and carbamoyl phosphate 
to citrulline. In mammals this enzyme participates in the urea cycle [3] and is located in the mitochondrial matrix. In 
40 prokaryotes and eukaryotk: mcroorganisms it is involved in the biosynthesis of arginine. In some bacterial species it 
is also involved in the degradatnn of arginine [4] (the arginine deaminase pathway). 

It has been shown [5] that these two enzymes are evotutbnary related. The predated secortdary structure of both 
enzymes are similar and there are some regions of sequence similarities. One of these regions Includes three residues 
which have been shown, by crystaltographk: studies [6], to be implkated in binding the phosphory I group of cart>amoyI 
45 phosphate. 

This region was selected as a signature for these enzymes. 

Consensus pattern: F-x-[EK]-x-S-[GT|-R-T[S, R, and the 2nd T bind carbamoyl phosphate] 

-Note: the resklue in positkxi 3 of the pattern altows to distinguish between an ATCase (Glu) and an OTCase (Lys). 

so 1 1] Lemer C.G., Switzer R.L J. Biol. Chem. 261:11156-11165(1986). 

( 2] Davklson J.N., Chen K.C.. Jamison R.S., Musmanno LA.. Kern C.B. BwEssays 15:157-164(1993). 

[ 3] Takiguchi M., Matsubasa T, Amaya Y. Mori M. Bk)Essays 10:163-166(1989). 

[ 4] Baur H.. Stabn V.. Falmagne P. Luethi E., Haas D. Eur. J. Biochem. 166:111-117(1987). 

[ 5] Houghton J.E.. Bencini D.A., OlDoncvan G.A.; WiW J.R. Proc. Natl. Acad. Scl, U.S.A. 81:4864-4868(1981). 
ss [ 6] Ke H.-M.. Honzatko R.B., Lv)scomb W.N. Proc. Natl. Acad. Sci. U.S.A. 81 :4037-4040(1 984). 

[1081] 378. Oleosins signature 

Oleosins [1] are the proteinaceous components of plants' lipkJ storage bodies called oil bodies. Oil bodies are small 
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droplets (0.2 to 1.5 mu-m in diameter) containing mostly triacylglycerol that are surrounded by a phospholipid/oleosin 
annulus. Oleosins may have a structural role in stabilizing the lipid body during dessication of the seed, by preventing 
coalescence of the oil. They may also provide recognition signals for specific lipase anchorage in lipofysis during 
seedling growth. Oleosins are found in the monolayer lipid/ water interface of oil bodies and probably interact with both 

s the fipid and phospholipid moieties. 

Oleosins are proteins of 16 Kd to 24 Kd and are composed of three domains: an N-terminal hydrophilic region of 
variable length (from 30 to GO residues); a central hydrophobic domain of about 70 residues and a C-terminal amphip- 
athic region of variable length (from 60 to 100 residues). The central hydrophobic domain is proposed to be made up 
of beta-strand structure and to interact with the lipids [2]. It is the only domain whose sequence is consented and 

10 therefore a section from that domain was selected as a signature pattem. 

[1082] Consensus pattem: [AGHST|-x(2)-{AGl-x(2)-[LIVM]4SAD]-T-P-[LIVMF](4)-F-S-P-[LIVM](3)-P-A 

[ 1] Murphy D.J., Keen J.N., OSullivan J.N., Au D.M.Y, Edwards E.-W.. Jackson RJ., Cummins I., Gibbons T, 
ShawC.H., Ryan A.J. Biochim. Biophys. Acta 1088:86-94(1991). 
IS 1 2] Tzen J.T.C., Lie G.C., Huang A.H.C. J. Biol. Chem. 267:15626-15634(1992). 

[1 083] 379. (Orbi VP5) Orbivirus outer capsid protein VP5 

[1084] This paper shows the location of the different capsid proteins and their relation to each other. 
[1085] [1] Schoehn G, Moss SR, Nuttall PA. Hewat EA; Virology 1997;235:191-200. 
20 [1086] 380. Om/DAP/Arg decarboxylases family 2 signatures 

Pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and related substrates can be classified into 
two different families on the basis of sequence similarities [1 ,2.3]. The second family consists of: 

- Eukaryotic ornithine decartx)xylase (EC 4.1 .1 .17) (ODC). ODC catalyzes the transformation of ornithine Into pu- 
25 trescine. 

- Prokaryotic diaminopimelk: acid decarboxylase (EC 4.1.1.20) (DAPDC). DAPDC catalyzes the conversk)n of di- 
aminopimelic acid into lysine; the last step in the bkssynthesis of lysine. 

- Pseudomonas syrlngae pv. tabaci protein tabA tabA is probably involved in the biosynthesis of tabtoxin and is 
highly similar to DAPDC. 

30 - Bacterial and plant biosynthctic arginine decarboxylase (EC 4. 1 . 1 . 1 9) (ADC). ADC catalyzes the transformatfen 
of arginine into agmatine, the first step in the bk>synthesis of putrescine from arginine. 

The above proteins, while most probably evolutkxiary related, do not share extensive regions of sequence similarities. 
Two of the consented regions were selected as signature patterns. The first pattern contains a consented lysine residue 
35 which is known, in mouse ODC [4], to be the site of attachment of the pyrkioxal-phosphate group. The second pattern 
contains a stretch of three consecutive glycine residues and has been proposed to be part of a substrate-binding regkxi 
[5]. 

These enzymes are collectively known as group IV decarboxylases [3]. 

[1087] Consensus pattem: [FY]-(PA]-x-K-[SACVJ-INHCLFVyn-x(4)-(LIVMFJ-[LlVI^TA]-x(2)-{LIVMAJ'X(3)-[GTE] [K is 
40 the pyridoxal-P attachment site] 

Consensus pattem: [GS]-x(2,6)-{Ll VMSCP]-x(2)-{LlVMF]- [DNS]-[LIVMCA]-G-G-G-[L1VMFY}-[GSTPCEQ] 

[ 1] Bairoch A. Unpublished obsen/atk)ns (1993). 

[ 2) Martin C, CamI B.. Yeh P., Stragier P.. Parsot C, Patte J.-C. Mol. Bfol. Evol. 5:549-559(1988). 
45 [ 3] Sandmeier E., Hale Tl.. Christen P. Eur. J. Biochem. 221:997-1002(1994). 

[ 4] Poulin R.. Lu L, Ackermann B., Bey P. Pegg AE. J. Bk)l. Chem. 267:150-158(1992). 
[ 5] Moore RC. Boyle S.M. J. Bacterk>l. 172:4631-4640(1990). 

[1 088] 381 . Osteopontin signature 
so Osteopontin fe an addrc phosphorylated glycoprotein of about 40 Kd which is abundant in the mineral matrix of bones 

and which binds tightly to hydroxyapatite [1 .2.3]. It is suggested that osteopontin might functbn as a cell attachment 

factor and couW play a key role in the adhesion of osteoclasts to the mineral matrix <rf bone. 

Osteopontin-K is a kidney protein wh»h is highly similar to osteopontin and probably also involved in cell^dhesion. 

As a signature pattem a highly consented regkxi kxated at the N-terminal extremity of the mature protein was selected. 
ss [1089] Consensus pattem: [KQl-x-[TAhx(2)4GA]-S-S-E-E-K 

[ 1] Butler W.T Connect Tissue Res. 23:123-36(1989). 
[ 2] Gorski J.P. Cafcif. Tissue Int. 50:391*396(1992). 
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[ 3] Denhardt D.T.. Guo X. FASEB J. 7:1475-1482(1993). 
[1 090] 382. Qxysterol-binding protein family signature 

A number of eukaryotic proteins that seem to be involved with sterol synthesis and/or its regulation have been found 
s [1 ] to be evolutionary related: 

- Mammalian oxysterol-binding protein (OSBP). A protein of about 600 amino-acid residues that binds a variety of 
oxysterols: oxygenated derivatives of cholesterol. OSBP seems to play a complex role In the regulation of sterol 
metabolism. 

10 - Yeast proteins HES1 and KES1; highly related proteins of 434 residues that seem to play a rote in ergosterol 
synthesis. 

- Yeast OSH1, a protein of 859 residues that also plays a role in ergosterol synthesis. - Yeast hypothetical protein 
YHROOIw (437 residues). 

- Yeast hyp(^etical protein YHR073w (996 residues). 
IS . Yeast hypothetical protein YKR003w (448 residues). 

[1091] All these protehs contain a moderately conserved domain of about 250 residues located in the C4emiinal 
half of OBSR OSH1 and YHR073w and in the central section of the other proteins. As a signature pattern, the best 
consented part was selected of this domain, a region that contains a consen/ed pentapeptkJe. 
20 [1092] Consensus pattern: E-[KQ]-x-S-H-(HR]-P-P-x-[STACF]-A 

[1093] [ 1] Jiang 8.. Brown J.L, Sheraton J., Fortin N.. Bussey H. Yeast 10:341-353(1994). 

[1094] 383. FMN oxidoreductase 

[1095] 384. Oxidoreductase FAD/NAD-binding domain 

Number of members: 250 

25 [1] 

Medline: 92084635 

The sequence of squash NADH:nitrate reductase and its relationship to the sequences of other flavoprotein oxidore- 
ductases. A family of flavoprotein pyridine nucleotide cytochrome reductases. 
Hyde GE. Crawford NM. Campbell W: 
30 J Biol Chem 1 991 ;266:23542-23547. 

[2]Medline: 95111952 

Crystal stmcture of the FAD-containing fragment of com nitrate reductase at 2.5 A resolution: relationship to other 
flavoprotein reductases. 
Lu G, Campbell WH. Schneider G, Undqvist Y; 
35 Structure 1994;2: 809^21. 

[1096] 385. (oxidored molyb) Eukaryotic molybdopterin oxidoreductases signature A number of different eukaryotic 
oxktoreductases that require and bind a molybdopterin cofactor have been shown [1 ] to share a few regk)ns of sequence 
similarity. These enzymes are: 

40 - Xanthine dehydrogenase (EC 1 . 1 . 1 .204), which catalyzes the oxidatbn of xanthine to uric acid with the concomitant 
reduction of NAD, Structurally, this enzyme of about 1 300 amino ackte consists of at least three distinct domains: 
an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain (see <PDOC00175>). a central FAD/NAD-binding 
domain arKi a C-terminal Mo-pterin dorriain. 

- Aldehyde oxidase (EC 1.2.3.1), whk:h catalyzes the oxkJatk>n aldehydes into acids. Aldehyde oxkiase is highly 
^ similar to xanthine dehydrogenase in its sequence and domain structure. 

- Nitrate reductase (EC 1 .6.6.1 ), which catalyzes the reductkxi of nitrate to nitrite. Structurally, this enzyme of about 
900 amino acids consists <rf an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding domain 
(see <PDOC00170>) and a C-terminal FAD/NAD-binding cytochrome reductase domain. 

- Sulfite oxkiase (EC 1 .6.3.1), whk:h catalyzes the oxklatk}n of sulfite to sulfate. Structurally, this enzyme of about 
so 460 amino ackis consists of an N-tenmlnal cytochrome b5-binding domain followed by a Mo^terin domain. 

There are a fewconsen/ed regwns in the sequence of the molybdopterin-binding domain of these enzymes. The pattern 
used to detect these proteins is based on one of them. It contains a cysteine residue whkjh couW be involved in binding 
the molybdopterin cofactor. 

ss [1097] Consensus pattem: [GA]-x(3)-[KRNQHT>x(1 1 . 1 4)-ILI VMFYWS]-x(8)-[LI VMF]-x-C-x(2)-[DEN]-R-x(2)-{DE] 
[1098] [1] Wootlon J.C.. Nkx)lson RE., Cock J.M., Walters D.E., Burke J R., Doyle W./^. Bray R.C. Bkx:him. Biophys. 
Acta 1057:157-185(1991). 

[1099] 386. (Oxktored ql) N/VDH-Ubk;uinone/pIastoquinone (complex I), various chains 
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This family Is part of complex I which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction 
that is associated with proton translocation across the membrane. Number of members: 1824 
[1] 

Medline: 93110040 

s The NADH: ubiquinone oxidoreductase (complex I) of respiratory chains. Walker JE; 
Q Rev Biophys 1992;25:253-324. 
[1100] 387. (oxidored q3) NADHHibiquinone/|plastoquinone oxidoreductase chain 6. 179 members. 
[1101] 388. (oxidored q5) NADH-ubiquinone oxidoreductase chain 4, amino terminus 
[1102] [1] Walker JE ; Q Rev Btophys 1992:25:253-324. 

10 [1103] 389. (oxkJored q6) Respiratory-chain NADH dehydrogenase 20 Kd subunrt signature Respiratory-chain NADH 
dehydrogenase (EC 1.6.5.3) [1.2] (also known as complex I or NADH-ubiquinone oxidoreductase) is an ollgomerk: 
enzymatic complex located in the inner mitochondrial membrane whk:h also seems to exist In the chk>roplast and in 
cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this bioener- 
getic enzyme complex there Is one with a molecular weight of 20 Kd (in mammals) [3], which is a component of the 

IS iron-sulfur (IP) fragment of the enzyme. It seems to bind a 4Fe-4S iron-sulfur cluster. The 20 Kd subunit has been 
found to be: 

- Nuclear encoded, as a precursor form with a transit peptMe in mammals, and in Neurospora crassa. - Mitochondrial 
encoded in Paramecium (gene psbG). 

20 - Chk>roplast encoded in vartous higher plants (gene ndhK or psbG). 

The 20 Kd subunit is highly similar to [4]: 

- Synechocystis strain PCC 6803 proteins psbGI and psbG2. 

25 . Subunit B of Escherichia coll NADH-ublqulnone oxidoreductase (gene nuoB). 

- Subunit NQ06 of Paracoccus denitrificans NADH-ubiquinone oxktoreductase. 
Subunit 7 of Escherichia coll formate hydrogenlyase (gene hycG). 

Subunit I of Eschernhia coli hydrogerTase-4 (gene hyfl). 



30 



As as signature pattern a highly conserved region was selected, located in the central section of this subunit and which 
contains a conserved cysteine that is probably involved in the binding of the 4Fe^S center. 
[1 104] Consensus pattem: [GN]-x-D-[EASTl-[LI VMF](2)-P-[I Vl-D-[LIVMFYW](2)-x-P-x-C-P-[PT) [The C is a putative 
4Fe-4S Itgand] 

35 1 1] Ra^n C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Frledrbh T. Hofhaus G., Preis D. Eur. J. Blochem. 197:563-576(1991). 

[ 3] Arizmendi J.M.. Runswick M. J.. Skehel J.M., Walker J.E. FEBS Lett. 301 :237-242(1992). 

( 4] Weidner U.. Geier S.. Ptock A., Friedrk*i T, Leif H., Weiss H. J. Mol. Bk>l. 233:109-122(1993). 

^ [1 1 (K] 390. p53 tumor antigen signature 

The p53 tumor antigen [1 to 5, E1.E2] is a protein found in increased amounts in a wide variety of transformed cells. 
It is also detectable in many proliferating nontransfonmed cells, but it is undetectable or present at tow levels in resting 
cells. It is frequently mutated or inactivated in many types of cancer. p53 seems to act as a tumor suppressor in some, 
but probably not all, tumor types. p53 is probably involved in cell cycle regulation, and may be a trans-activator that 

45 acts to negatively regulate cellular divisk>n by controlling a set of genes required for this process. 

p53 Is a phosphoprotein of about 390 amino acids which can be subdivkJed into four domains: a highly charged ackiic 
region of about 75 to 80 reskJues, a hydrophobic proline-rich domain (position 80 to 150). a central region (from 150 
to about 300). and a highly bask: C-temiinal regkm. The sequence of p53 is well conserved in vertebrate species; 
attempts to identify p53 on other eukaryotk: philum has so far been unsuccessful. 

so As a signature pattem for p53 a perfectly consented stretch of 1 3 resklues kx:ated in the central regbn of the protein 
was selected. This regkin, known as domain IV in [3], is Involved (along with an adjacent regkxi) in the binding of the 
large T antigen of SV40. In man this region is the focus of a variety of point mutations in cancerous tumors. 
[1106] Consensus pattem: M-C-N-S-S-C-M-G-G-M-N-R-R 

ss I ij Levine A.J., Momand J., Finlay C.A. hteture 351:453-456(1991). 

[ 2] Levine A.J., Momand J. Bkx:hlm. Bk)phys, Acta 1032:119-136(1990). 
I 3] Soussi T, Caron De Fromentel C, May P. Oncogene 5:945-952(1990). 
[ 4] Lane D.R, Benchimol S. Genes Dev. 4:1 -8(1990). 
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1 5] Ulrich SJ., Anderson C.W.. Mercer W.E.. Appella E. J. Biol. Chem. 267:15259-15262(1992). 
[1107] 391. (P5CR) Delta 1-pyrroline-5-carboxylate reductase signature 

Delta 1-pyrrolinG-5-cart)oxylate reductase (P5CR) (EC 1.5.1.2) [1,2] is the enzyme that catalyzes the terminal step in 
s the biosynthesis of proline from glutannate, the NAD(P) dependent oxidation of 1 -pyrroline-5-cartx)xylate into proline. 
The sequences of P5CR from eubacteria (gene proC), archaebacteria and eukaryotes show only a moderate level of 
overall similarity. As a signature pattern, the best conserved region located in the C4erminal section of P5CR was 
selected. 

[1108] Consensus pattern: lPALn-x(2,3)4LIV]-x(3)-ILIVM]-(STAC]-ISTV]-x-[GAN]-G-x-T-x(2)-[AG]-[LIV]-x(2)- 
10 [LMFHDENQK] 

[ 1] Delauney A. J.. Verma D.P. Mol. Gen. Genet. 221:299-305(1990). 

[ 2] Savioz A., Jeenes D.J.. Kocher H.P. Haas D. Gene 86:107-111 (1990). 

IS [1109] 392. Poly-adenylate binding protein, unique domain. 

[1110] 393. (PAL) Phenylalanine and histidine ammonia-lyases active site 

Phenylalanine amnx)nia-lyase (EC 4.3.1.5) (PAL) is a key enzyme of plant and fungi phenylpropanoid metabolism 
which is involved in the biosynthesis of a wide variety of secondary metabolites such as flavanoids, furanocoumarin 
phytoalexins and cell wall components. These compounds have many important roles in plants during normal growth 
20 and in responses to environmental stress. PAL catalyzes the removal of an ammonia group from phenylalanine to form 
trans-cinnamate. 

Histidine ammonia-lyase (EC 4.3.1.3) (histidase) catalyzes the first step in histidine degradation, the removal of an 
ammonia group from histidine to produce urocanic acid. 

The two types of enzymes are functionally and stnicturally related [1 ]. They are the only enzymes which are known to 
2S have the nrwdified amino acki dehydroalanine (DHA) in their active site. A serine residue has been shown [2,3,4] to be 
the precursor of this essential electrophilic moiety. The region around this active site resktue is well consen/ed and 
can be used as a signature pattern. 

[1111] Consensus pattern: G-[STGHLIVM]-[STG]-(AChS-G-[DH]-L-x-P-L-[SA]-x(2)4SA] [S is the active site residue] 

30 [ 1] Taylor R.G., Lambert M.A.. Sexsmilh E., Sadler S.J., Ray PN., Mahuran D.J.. Mclnnes R.R. J. Bk)l. Chem. 

265:18192-18199(1990). 

[ 2] Langer M., Reck G.. Reed J.. Retey J. Biochemistry 33:6462-6467(1994). 

[ 3] Schuster B., Retey J. FEBS Lett. 349:252-254(1994). 

[ 4] Taylor R.G., Mclnnes R.R. J. B»l. Chem. 269:27473-27477(1994). 

3S 

[1112] 394. PAS domain 

-I- CAUTION. This family does not cunrently match all known examples of PAS domains. 
PAS nriotifs appear in archaea, eubacteria arid eukarya. Probably 
the most surprising identifKatkxi of a PAS domain was that In 
40 EAG-like K+-channels[1 ,3]. 
Number of members: 308 
[11 

Medline: 97446881 

PAS domain S-baxes in archaea, bacteria and sensors for oxygen and redox. 
45 Zhulin IB, Taylor BL, Dixon R; , 
Trends Bkxhem Sci 1997;22:331-333. 
[2]Medline: 95275818 

1.4 A structure of photoactive yeltow protein, a cytosolic photoreceptor unusual fokJ. active site, and chronrxiphore. 
Borgstahl GE, Williams DR. Getzoff ED; 
so Biochemistry 1995;34:6278-6287. 

[3]Medline: 98044337 
PAS. a multifunctkxial domain family comes to light. 
Ponting CP. Aravind U 
Curr Bk)l 1 997;7:674-677. 
ss [1113] 395. (PBP) Phosphatidylethanolamine-binding protein family signature 

Mammalian phosphatkiylethanolamine-binding protein (also knowns as basic cytosolic 21 Kd protein) is a 186 residue 
protein found in a variety of tissues [1 ]. It binds hydrophobe ligands, such as phosphatkJyIethanolamine, but also seems 
[2] to bind nucleotides such as GTP and FMN, it is suggested that it could act in membrane remodeling during growth 
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and maturation. This protein belongs to a family that also includes: 

Drosophila antennal protein A5, a putative odorant-binding protein. 

- Onchocerca volvulus antigen Ov-1 6 and the related proteins D1 , D2 and D3. 
s - Plasnnodium falciparum putative phosphatidylethanolamine-binding protein. 

- Toxocara canis secreted antigen TES-26. This lan^al protein has been shown to bind phosphatidyiethanotamine. 

- Yeast protein DKA1 (also known as NSP1 or TFS1 ). The function of this protein Is not very clear. - Yeast hypothetical 
protein YLR179C. 

Caenorhabditis elegans hypothetical protein F40A3.3. 

10 

As a signature pattern, the best conserved region was selected which is located in the end of the first third of the 
sequence of these proteins. 

[1114] Consensus pattern: [FYLhx-(LVHUVF]-x-(TIVHDCl-P-D-x-P4SN]-x(10)-H 

IS 1 1] Seddiqi N.. Bollengier P., Atliel RM.. Perin J.P., Bonnet P.. Bucquoy S., Jolles P., Schoentgen F. J. Mol. Evol. 

39:655-660(1994). 

[ 2] Schoentgen P., Jolles R FEBS Lett. 369:22-6(1995). 

[1115] 396. PCI domain 
20 This domain has also been called the PINT motif (Proteasome, 
lnt-6, Nip-1 and TRIP-15) [1]. 
Number d members: 49 
[1] 

Medtlne: ^308842 
25 The PCI domain: a common theme in three multiprotein complexes. 
Hofmann K, Bucher P; 

Trends Biochem Sci 1 998;23:204-205. 
[2]Medline: 98266368 

Homologues of 26S proteasome subunits are regulators of transcription and translation. 

30 Aravind L, Renting CP; 

Protein Sci 1998;7:1250-1254. 
[Iliq 397. (PCMT) Prolein-L-isoasparlate (D-aspartate) Omethyltransferase signature. Protein-L-isoaspartate (D- 
aspartate) Omethyltransferase (EC 2.1.1 .77) (PCMT)[1] (which is also known as L-isoaspartyl protein carboxyl meth- 
yltransferase)is an enzyme that catalyzes the transfer of a methyl group from S-adenosy {methionine to the free cart)oxyl 

35 groups of D-aspartyl or L-isoaspartyl residues in a variety of peptides and proteins. The enzyme does not act on nomnal 
L-aspartyl reskJues L-isoaspartyl and D-aspartyl are the products of the spontaneous de amkSaXkm andtor isomerizaton 
of normal L-aspartyl and L-asparaginyl residues in proteins. PCMT plays a role in the repair andbrdegradatkxi of these 
damaged proteins; the enzymatk: methyl esterificatbn of the abrK>nmal residues can lead to their conversion to normal 
L-aspartylreskJues. PCMT is a welt-conserved and wkiely distrbuted cytosolic protein of about 24Kd. As a signature 

40 pattern, a consen/ed reg»n in the central part of this enzyme has been developed. 
[1117] Consensus pattem: IGSA]-D-G-x(2)-G-[FYWV]-x(3)4AS]-P-[FY]-[DN]-x-l - 

[1118] [ 1] Kagan R.M.. McFadden H.J.. McFadden RN., O'Connor C. Clarke S. Comp. Bkxhem. Phystol. 117b: 
379-385(1997). 

[1119] 398. (PCNA) Proliferating cell nuclear antigen signatures 
45 Proliferating cell nuclear antigen (PCNA) [1 ,2] is a protein involved in DNA replication by acting as a cofactor for DNA 
polymerase delta, the polymerase responsible for leading strand DNA replk:ation. 

A similar protein exists in yeast (gene POL30) [3] and is associated with polymerase III, the yeast anatog of polymerase 
delta In bacukyviruses the ETL protein has been shown [4] to be highly related to PCNA and Is probably associated 
with the viral encoded DNA polymerase. An homolog of PCNA is also found in archebacteria. 
so As signatures for this family of proteins, two conserved regkxis were selected located In the N-terminal sectkm. The 
secoTKi one has been proposed to bind DNA. 

[1120] Consensus pattem: [GAJKLIVMFl-x-[UVI^]-x4SA\^IVM}-D-x-[NSAE]-[HKRHVIhx-[LY]-[VGA]-x^^^^ 
x-[LIVI^x(4)-F 

55 - Consensus pattern: [RKA]-C4DEJ4RHl-x(3)4LIVMF]-x(3)-[LIVM]-x4SGAN]4UVMF]-x-KHLI>^F](2) 
[ 1] Bravo R.. Frank R.. Blundell RA., McDonald-Bravo H. Nature 326:515-517(1987). 

[ 2] Suzuka I.. Hata S.. Matsuoka M., Kosugi S.. Hashimoto J. Eur. J. Bk)chem. 195:571 -575(1 991 ).[ 3] Bauer G. 
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A., Burgess P.M. J. Nucleic Acids Res. 18:261-265(1990). 

[ 4] O'Reilly D.R.. Crawford A.M.. Milter LK. Nature 337:606-606(1 989). 

[1121] 399. {PUT) Prephenate dehydratase sigr^atures 

Prephenate dehydratase (EC 4.2.1.51) (PDT) catalyzes the decarboxylation of prephenate Into phenylpyruvate. In 
microorganisms PDT is involved in the terminal pathway ot the biosynthesis of phenylalanine. In some bacteria such 
as Escherichia coll PDT is part of a bif uncllonal enzyme (P-protein) that also catalyzes the transformation of chorismate 
into prephertate (chorismate mutase) while in other bacteria it is a monofunctlonal enzyme. The sequerK:e of mono- 
functional PDT align well with the C-terminal part of that of P-proteins [1]. 

As signature patterns for PDT two conserved regions were selected. The first region contains a conserved threonine 
which has been said to be essential tor the activity of the enzyme In E. coli. The second region includes a consented 
glutamate. Both regions are in the C4erminal part of PDT 

[1122] Consensus pattern: [FYJ-x-{LIVM]-x(2)-{UVM]-x(5)-[DN}-x(5)-T-R-F-[LIVMWl-x-[LlVM] 
[1123] [ 1] Fischer R.S.. Zhao G., Jensen R.A. J. Gen. Microbiol. 137:1293-1301(1991). 
[1 1 24] 400. PDZ don«in (Also known as DHR or GLGF). 
[1 1 25] PDZ domains are found in diverse signaling proteins. 
[1126] [1] Ponting CP, Phillips C, Davies KE. Blake DJ 

Bk)essays 1997; 19:469^79. 12] Doyle DA. Lee A, Lewis J, Kim E. Sheng M. MacKinnon R; Cell. 1996;85:1067-1076. 

[3] Ponting CP; Prc^ein Sci 1997;6:464-468. 

[1 1 27] 401 . (PPDK_N_term) PEP-utilizing enzymes signatures 

A number of enzymes that catalyze the transfer of a phosphoryl group from phosphoenolpyruvate (PEP) via a phospho- 
histkjine intermediate have been shown to be structurally related [1 ,2,3,4]. These enzymes are: 

- Pyruvate.orthophosphate dikinase (EC 2.7.9. 1 ) (PPDK). PPDK catalyzes the reversible phosphorylatbn of pyru- 
vate and phosphate by ATP to PEP and diphosphate. In plants PPDK function in the direction of the formation of 
PEP, which is the primary acceptor of carbon dk>xide in C4 and crassulacean ackJ metabolism plants. In some 
bacteria, such as Bacterokies symbkisus. PPDK functbns In the directbn of ATP synthesis. 

- Phosphoenolpyruvate synthase (EC 2.7.9.2) (pyruvate,water dikinase). This enzyme catalyzes the reversible 
phosphorylation of pyruvate by ATP to form PEP, AMP and phosphate, an essential step in gluconeogenesis when 
pyruvate and lactate are used as a cart>on source. 

- Phosphoenolpyruvate-protein phosphotransferase (EC 2.7.3.9). This is the first enzyme of the phosphoenolpyru- 
vate-dependent sugar phosphotransferase system (PTS), a major carbohydrate transport system in bacteria. The 
PTS catalyzes the pho6phorylatk>n of incoming sugar substrates concomitant with their translocatbn across the 
cell membrane. The general mechanism of the PTS is the foltowing: a phosphoryl group from PEP is transferred 
to enzyme-l (El) of PTS whch in turn transfers it to a phosphoryl carrier protein (HPr). Phospho-HPr then transfers 
the phosphoryl group to a sugar-specific penmease. 

All these enzymes share the same catalytK mechanism: they bind PEP and transfer the phosphoryl group from it to a 
histkiine residue. The sequence around that reskJue is highly consented an6 can be used as a signature pattern for 
these enzymes. As a second signature pattern a consented region was selected in the C4ennlnal part of the PEP- 
utilizing enzymes. The bk>logk:al significance of this region is not yet known. 

[1128] Consensus pattern: G-{6A]-x-rrN]-x-H-[STA]-[STAVl-[UVMl(2)-[STAVHRG] [H is phosphorylated] 

- Consensus patlem: [DEQSKl-x-(LIVMF]-S-(UVMn-G-IST]-N-D-ILl\^]-x-<>(LIVMFYGT|-[STALIV]-[LIVMF]- 
{GAS]-x(2>-R 

[ 1] Reizer J., Hoischen C, Reizer A., Pham T.N., Saier M.H. Jr. Protein Sci. 2:506-521(1993). 

[ 2] Reizer J.. Reizer A., Merrck M.J., Plunkett G. Ill, Rose D.J., Saier M.H. Jr. Gene 181:103-108(1996). 

[3] Pocalyko D.J., Canoll L.J., Martin B.M., Babbitt PC, Dunaway-Mariano D. Bkx;hemistry 29:10757-10765 

(1990). 

[ 4] Niersbach M.. Kreuzaler R, Geerse R.H., Postma P., Hirsch HJ. Mol. Gen. Genet. 232:332-336(1992). 
[1129] 402. (PEPCK ATP) Phosphoenolpyruvate carboxykinase (ATP) signature 

Phosphoenolpynivatecarboxyklnase (ATP) (EC 4. 1 .1 .49) (PEPCK) [1] catalyzes the fomratkxi of phosphoenolpyruvate 
by decarboxylation of oxak)acetate while hydrolyzing ATP. a rate limiting step in gluconeogenesis (the biosynthesis of 
glucose). 

The sequence of this enzyme has been obtained from Escherichia coli, yeast, and Trypanosoma brucei; these three 
sequences are evolutk)nary related and share many regions of similarity. As a signature pattern a highly consented 
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region was selected that contains four acidic residues and which is located in the central part of the enzyme. The 
l>egrnning of the pattern is located about 10 residues to the C-terminus of an ATP-binding motif W (P-loop).(see 
<PDOC00017>) and Is also part of the ATP-binding domain [2]. 
[1 1 30] Consensus pattern: L-l-G-D-D-E-H-x-W-x-[DE)-x-G-{l Vj-x-N 

5 

■ Note: phosphoenolpyojvate carboxykinase (GTP) (EC 4.1.1.32) an enzyme that catalyzes the same reaction, but 
using GTP instead of ATP. is not related to the above enzyme (see <PDOC00421>). 

[ 1] Medina V., Pontarolto R.. Glaeske D., Tabel H., Goldie H. J. Bacteriol. 172:7151-7156(1990). 
10 1 2] Matte A.. Goldie H.. Sweet R.M., Delbaere LTJ. J. Mol. Biol. 256:126-143(1996). 

[1131] 403. (Pepcase) Phosphoenolpymvate carboxylase active sites. Phosphoenolpyruvate carboxylase (EC 
4.1.1.31) (PEPcase) catalyzes the irreversible beta-carboxylation of phosphoenolpyruvate by bicarbonate to yield 
oxaloacetate and phosphate. The enzyme Is found in all plants and in a variety of microorganisms. A histidine [1] and 
IS a lysine [2] have been implicated in the catalytic mechanism of this enzyme; the regions around these active site 
residues are highly conserved in PEPcase from various plants, bacteria and cyanobacteria and can be used as a 
signature patterns for this type of enzyme. 

[1132] Consensus pattern: (VT].x-T-A-H-P-T-{EQ].x(2)-R-[KRH] [H is an active site residue]- 
Consensus pattern: [IVJ-M-[LIVMhG-Y-S-D-S-x-K-D-[STAG]-G [K is an active site resldue]- 
20 [1133] [ 1] Terada K., Izui K. Eur. J. Biochem. 202:797-803(1 991 ).[ 2] JIao J.-A.. Podesta RE.. Chollet R.. CLeary 
M.H., Andreo C.S. Biochim. Biophys. Acta 1041:291-295(1990). 
[1134] 404. PET112 family signature 

The following proteins from eukaryotes. prokaryotes and archaebacteria belong to the same family: 

25 - Yeast mitochondrial protein PET112 [1], whk:h plays an unknown role in the expression of mitochondrial genes, 
probably at the level of translatbn. 
Aspergillus nidulans mitochondrial protein nempA. 

- Bacillus subtills hypothetk:al protein yzdD. 

- Moraxella catarrtialis hypothetical protein In bIoR-1 3'regbn. 
30 - Mycoplasma genitalium hypothetical protein MG100. 

- Methanococcus jannaschti hypothetical proteins MJ001 9 and MJ01 60. 

The size of these proteins range from 419 to 630 amino acids. As a signature pattern, a consented region kx:ated in 
the N-terminal section was selected. 
35 [1 1 35] Consensus pattern: [DN]-x-[DN]-R-x(3)-P-L4LI V1-E-[LI V]-x-IST]-x-P 

[1136] [ 1] Mulero J.J., Rosenthal J.K., Fox TD. Curr. Genet. 25:299-304(1994). 
[1137] 405. (PFK) Phosphofructokinase signature 

Phosphofructokinase (EC 2.7.1.11) (PFK) [1.2] is a key regulatory enzyme in the glycolytic pathway. It catalyzes the 
phosphorytatnn by ATP c# f nictose 6-phosphate to fructose 1 ,6-bisphosphate. In bacteria PFK is a tetramer of klentlcal 

40 36 Kd subunits. In mammals it is a tetramer of 80 Kd subunrts. Each 80 Kd subunit consist of two homologous domains 
which are highly related to the bacterial 36 Kd subunits. In Hunnan there are three, tissue-specific, types of PFK iso- 
zymes: PFKM (muscle). PFKL (liver), and PFKP (platelet). In yeast PFK Is an oclanrmr composed of four 100 Kd alpha 
chains (gene PFK1) and four 100 Kd beta chains (gene PFK2); like the mammalian 80 Kd subunits, the yeast 100 Kd 
subunits are composed of two homotogous domains. 

45 As a signature pattern for PFK a regkxi that contains three bask: reskiues involved in fructose^hosphate binding 
was selected. 

[1 1 38] Consensus pattem: (RK]-x(4)-G-H-x-CKQR]-G-G-x(5)-D-R [The R/K. the H and the Cm are involved in f ruc- 
tose-6-P binding] 

so - fstote: Escherichia coli has two phosphofmctokinase isozymes which are encoded by genes pf kA (major) and pf kB 
(minor). The pf kB isozyme is not evolutionary related to other prokaryotic or eukaryotc PFK's (see <PDOC00504>). 

[ 1] Poonman R.A.. Randolph A.. Kemp R.G.. Heinrikson R.L. Nature 309:467-469(1984). 

( 2] Heinisch J.. Rilzel R.G., von Borstel R.C.. Aguilera A, Rodicb R., Zimmennann RK. Gene 78:309-321 (1989). 

55 

[1139] 406. (PGAM) Phosphoglycerate mutase family phosphohistkJIne signature 

Phosphoglycerate mutase (EC 5.4.2.1 ) (PGAM) and bisphosphoglycerate mutase (EC 5.4.2.4) (BPGM) are structurally 
related enzymes wh«h catalyze reactbns involving the transfer of phospho groups between the three carbon atoms 
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of phosphoglycerate [1,2]. Both enzymes can catalyze three different reactions, although In different proportions: 

- The isomerization of 2-phosphoglycerate (2-PGA) to 3-phosphoglycerate (3-PGA) with 2.3<Jiphosphogtycerate 
(2,3-DPG) as the primer of the reaction. 

5 - The synthesis of 2,3-DPG from 1 ,3-DPG with 3-PGA as a primer. 

- The degradation of 2.3-DPG to 3-PGA (phosphatase EC 3. 1 .3. 1 3 activity). 

In mammals. PGAM is a dimeric protein. There are two Isoforms of PGAM: the M (muscle) and B (brain) forms. In 
yeast, PGAM is a tetrameric protein. BPGM is a dimeric protein and is found mainly in eiythrocytes where rt plays a 

10 major role In regulating hemoglobin oxygen affinity as a consequence of controlling 2.3-DPG concentration. 

The catalytic mechanism of both PGAM and BPGM involves the formation of a phosphohistidlne intermediate [3]. 
The bifunctlonal enzyme 6-phosphofmcto-2-klnase / fructose-2.fr*>isphosphatase (EC 2.7.1.105 and EC 3.1.3.46) 
(PF2K) [4] catalyzes both the synthesis and the degradation of fructose-2,6-bisphosphate. PF2K is an important en- 
zyme in the regulation of hepatic carbohydrate metabolism. Like PGAM/BPGM, the fnjctose-2,6-bisphosphatase re- 

is action involves a phosphohistidine intermediate and the phosphatase domain of PF2K is structurally related to PGAM/ 
BPGM. 

The bacterial enzyme alpha-ribazote-5'-phosphate phosphatase (gene cobC) which Is Involved In cobalamin biosyn- 
thesis also belongs to this family [5]. 

A signature pattern was built around the phosphohistidine residue. 
20 [1140] Consensus pattern: [U VM]-x-R-H-G4EQ]-x(3)-N [H is the phosphohistidine residue] 

- Note: some organisms harbor a fomi of PGAM independent of 2.3-DPG, this enzyme is not related to the family 
described above [6]. 

2S 1 1] Le Boulch P., Joulin V.. Garel M.-C.. Rosa J., Cohen-Solal M. Biochem. Biophys, Res. Commun. 156:874-881 

(1988). 

[ 2] White M.F., Fothergill-Gilmore L A. FEBS Lett. 229:383-387(1988). 
[ 3] Rose Z.B. Melh. Enzymol. 87:43-51(1982). 

[ 4] Bazan J.F. Fletterick R.J., Pilkis S.J. Proc. Natl. Acad. Sci. U.S.A. 86:9642-9646(1 989). 
30 [ 5] OToole G.A., Trzebiatowski J.R., Escalante-Semerena J.C. J. Bbl. Chem. 269:26503-26511 (1994). 

[ 6] Grana X, De Lecea L. El-Maghrabi M.R., Urena J.M., Caellas C, Carreras J., Puigdomenech R. Pilkis S.J.. 
Cllment F. J. Bk>l. Chem. 267:12797-12803(1992). 

[1141] 407. (PGI) Phosphoglucose isomerase signatures 

3S Phosphoglucose isomerase (EC 5.3. 1 .9) (PGI) [1 ,2] is a dimeric enzyme that catalyzes the reversible Isomerization of 
glucose-6-phosphate and f ructose-6-phosphate. PGI is involved in different pathways: in most higher organisms it is 
involved in glycolysis; in mammals it is involved in gluconeogenesis; in plants in carbohydrate biosynthesis; in some 
bacteria it provides a gateway for fructose into the Entner-Doudouroff pathway. PGI has been shown [3] to be ktentk:al 
to neuroleukln, a neurotrophic factor whk:h supports the sunnval of various types of neurons. 

40 The sequence of PGI from many species ranging from bacteria to mammals is available and has been shown to be 
highly consen/ed. As signature pattems for this enzyme two conserved regions were selected, the first region is located 
in the central section of PGI, while the second one Is located in its C-lenninal section. 
[11421 Consensus pattern: [DENS]-x-[UVM]-G-G-R-[FY]-S-[UVMT]-x-(STA]-[PSAC]-(LIVMA]-G 

4S - Consensus pattern: [GS]-x-IUVM]-[LIVMFYWI-x(4)-{FY]-IDN]-Q-x-G-V-E-x(2)-K 

[ 1] Achari A., Marshall S.E.. Muirhewad H.. Palmieri R.H.. Noltmann E.A. Phltos. Trans. R. Soc. Lond.. B, Bk>l. 
Sci. 293:145-157(1981). 

[ 2] Smith M.W.. Doolittle R.R J. Md. Evol. 34:544-545(1992). 
so 1 3] Falk P. Walker J.I.H.. Redmill A. A.M., Morgan M.J. Nature 332:455-456(1 988). 

[1143] 408. (PGK) Phosphoglycerate kinase signature 

Phosphoglycerate kinase (EC 2.7.2.3) (PGK) [1] catalyzes the second step in the second phase of glycolysis, the 
reversible conversksn of l.^diphosphoglycerate to 3-phosphoglycerate with generatkxi of one molecule of ATP. PGK 
ss is found in all living organisms and its sequence has been highly conserved throughout evolution. It is a two-domain 
protein; each domah is connposed of six repeats of an alpha/beta structural motif. As a signature pattern for PGK's. a 
consented region in the N-terminal region was selected. 
Consensus pattern: [KRHGTCVNHVT]-[LIVMFl-[UVMC]-R-x-D-x-N-ISACVl-P 
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[11441 1 1] Watson H.C., LittlechikJ J.A. Kochem. Soc. Trans. 18:187-190(1990). 

[1145] 409. (PGM PMM) Phosphoglucomutase and phosphomannomutase phosphoserlne signature 

- Phosphoglucomutase (EC 5.4.2.2) (PGM). PGM is an enzyme responsible for the conversion of D-glucose 1 -phos- 
phate Into D-glucose 6-phosphate. PGM participates in both the breakctown and synthesis of glucose [1]. 

- Phosphomannomutase (EC 5.4.2.8) (PMM). PMM is an enzyme responsible for the conversion of D-mannose 
1 -phosphate into D-mannose 6-ph06phate. PMM is required for different biosynthetic pathways In bacteria. For 
example, in enterobacteria such as Escherichiacoli there are two different genes coding for this enzyme: rfbK 
which is involved In the synthesis of the O antigen of lipopolysaccharide andcpsG which is required for the synthesis 
of the M antigen capsular polysaccharide [2]. In Pseudomonas aeruginosa PMM (gene algC) is Involved In the 
biosynthesis of the alginate layer [3] and in Xanthomonas campestris (gene xanA) it is involved In the biosynthesis 
of xanthan [4]. In Rhizobium strain ngr234 (gene noeK) it is Involved in the biosynthesis of the nod factor. 

- Phosphoacetylglucosamlne mutase (EC 5.4.2.3) which converts N-acetyl-D-glucosamine 1 -phosphate into the 
6-phosphate isomer. 

The catalytic mechanism of both PGM and PMM involves the formation of a phosphoserine intenmediate [1]. TTie 

sequence around the serine residue is well consented and can be used as a signature pattern. 

In addition to PGM and PMM there are at least three uncharacterized proteins that belong to this family [5.6]: 

- Urease operon protein ureC from Helicobacter pylori. 
Escherichia coli protein mrsA. 

- Paramecium tetrauretia parafusin. a phosphoglycoprotein involved in exocytosis. 

- A Methanococcus vannielii hypothetical protein in the 3^region of the gene for ribosomal protein SI 0. 

[1146] Consensus pattern: [GSA]-[LIVM]-x-[UVMJ4STHPGA]-S-H-x-P-x(4)-[GNHE] [S is the phosphoserine resi- 
due] 

Note: PMM from fungi do not belong to this family 

[ 1] Dai J.B., Liu Y, Ray W.J. Jr., Konno M. J. Biol. Chem. 267:6322-6337(1 992). 

( 2] Stevenson G., Lee S.J., Romana LK., Reeves PR. Mol. Gen. Genet. 227:173-180(1991). 

[ 3] Zielinski N.A., Chakrabarty A.M., Berry A. J. Biol. Chem. 266:9754-9763(1991). 

[ 4] Koeplin R.. Arnold W., Hoette B.. Simon R., Wang G., Puehler A. J. Bacterid. 174:191-199(1992). 

[ 5] Bairoch A. Unpublished observations (1993). 

[ 6] Subramanian S.V., Wyroba E.. Andersen A.P, Satir B.H. Proc. Natl. Acad. Sci. U.S.A. 91:9832-9836(1994). 
[1147] 410. PH domain profile 

The 'pleckstrin homology* (PH) domain is a domain of about 100 residues that occurs in a wide range of proteins 
involved in intracellular signaling or as constituents of the cytosketeton [1 to 7]. 

The f unctbn of this domain is not clear, several putative functions have been suggested: - binding to the beta/gamma 
subunit of heterotrimeric G proteins, 

- binding to lipids, e.g. phosphatidylinositoM.S-bisphosphate, 

- binding to phosphorylated Ser/Thr residues, 
attachment to membranes by an unknown mechanism. 

It is possible that different PH domair>s have totally different ligand requirements. 

The 3D stnjcture of several PH domains has been determined [8]. All known cases have a common structure consisting 
of two perpendicular anti-parallel beta sheets, folkywed by a C-termlnal amphipathk: helix. The loops connecting the 
beta-strands differ greatly in length, making the PH domain relatively difficult to detect. There are no totally invariant 
residues within the PH domain. 

Proteins reported to contain one more PH domains bekxtg to the following families: 

- Pleckstrin, the protein w^ere this domain was first detected, is the major substrate of protein kinase C in platelets. 
Pleckstrin is one of the rare prc^eins to contains two PH donrtains. 

- Ser/Thr protein kinases such as the Act/Rac family, the beta-adrenergic receptor kinases, the mu isoform of PKC 
and the trypanosomal NrkA family. 

- Tyrosine protein kinases beiortgtng to the ^k/ltk/Tec subfamily. 
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Insulin Receptor Substrate 1 (IRS-1). 
. Regulators of small G-proteins like guanine nucleotide releasing factor GNRP (Ras-GRF) (which contains 2 PH 
domains), guanine nuclec^ide exchange proteins like vav. dbl. SoS and yeast CDC24, GTPase activating proteins 
like rasGAP and BEM2/IPL2. and the human break point cluster protein bcr. 
5 - Cytoskeletal proteins such as dynamin (see <PCXX:00362>), Caenorhabditis elegans kinesin-like protein unc-1 04 
(see <PDOC00343>), spectrin beta-chain, syntrophin (2 PH domains) and yeast nuclear migration protein NUM1. 

- Mammalian phosphatWylinositol-specifk: phospholipase C (PI-PLC) (see <PDOC50007>) isoforms ganuna and 
delta. Isoform gamma contains two PH domains, the second one is split into two parts separated by about 400 
resKfues. - Oxysterol binding proteins OSBP, yeast OSH1 and YHR073w. 

10 - Mouse protein citron, a putative rhc/rac effector that binds to the GTP-bound forms of rho and rac, 

- Several yeast proteins involved In cell cycle regulation and bud iormaXton like BEM2, BEM3. BUD4 and the 
BEM1 -binding proteins Ba2 (BEB1) and B011 (BOB1). - Caenorhabditis elegans protein MtG-10. 

- Caenorhabditis elegans hypothetkral proteins C04D8. 1 , K06H7.4 and ZK632. 1 2. 

- Yeast hypothetk:al proteins YBR129c and YHR1 55w. 

IS 

The profile for the PH domain, which has been devetoped by Toby Gibson at the EMBL, covers the total length of 
domain. Several proteins contain large nsertnns in the PH domain and are thus difficult to detect with this profile. In 
some of these cases, the profile will align only to one half of the PH domain. 

20 - Sequences krwwn to belong to this class detected by the pattern: ALL. But it shouW be noted that while all se- 
quences containing PH domains are detected, not all PH domains are. Some of the split domains lie below the 
cutoff threshold. 

[ 1) Mayer B.J., Ren R.. Clark K.L, Baltimore D. Cell 73:629-630(1993). 

2S [ 2] Haslam R.J.. Koide H.B., Hemmings B.A. Nature 363:309-310(1993). 

( 3] Musacchk) A., Gibson T.J., Rice P., Thompson J., Saraste M. Trends Bkxrfiem. Sci. 18:343-348(1993). 
[4] Gibson TJ,, Hyvonen M., Musacchb A.. Saraste M., Bimey E. Trends Biochem. Sci. 19:349-353(1 994). [5] 
Pawson T. Nature 373:573-580(1 995).[ 6] Ingley E., Hemmings B.A. J. Cell. Biochem. 56:436-443(1 994). [ 7] Sar- 
aste M., Hyvonen M. Curr. Opin. Struct. Btol. 5:403-408(1 995). [8] Riddihough G. Nat. Struct. Biol. 1:755-757 

30 (1994). 

4ll.PHD-finger 
[1] 

Medline: 95216093 

3S The PHD finger lmplk:ations for chromattn-mediated transcriptional regulation. 
Aasland R, Gibson TJ, Stewart AF; 

Trends Biochem Sci 1995;20:56-59. 
Number of members: 181 

[11481 412. (PI-PLC-X) PhosphatkJyIinositol-specifk: phospholipase C profiles PhosphatkJyIinositol-specific phos- 
40 pholipase C (EC 3. 1 .4. 11 ), an eukaryotc intracellular enzyme, plays an important role in signal transductton processes 
[1]. It catalyzes the hydrolysis of 1-phosphatidyl-D-myo-inositol-3,4.5-triphosphate into the second messenger mole- 
cules diacylglycerol and inositol-1 ,4,5-triphosphate. This catalytic process is tightly regulated by reversible phosphor- 
ylatkvi and binding of regulatory proteins [2 to 4]. 

In mammals, there are at least 6 different isoforms of PI-PLC, they differ in their domain stnjcture, their regulatk)n, and 

4S their tissue distribution. Lower eukaryc^es also possess multiple isoforms of PI-PLC. 

All eukaryotb pi-PLCs contain two regkjns of homotogy, sometimes referred to as 'X-box' and 'Y-box'. The order of 
these two regions is always the same (NH2-X-Y-COOH), but the spacing is variable. In rrtost Isoforms, the distance 
between these two regwns is only 50-100 reskJues but in the gamma isoforms one PH domain, two SH2 don^ains. and 
one SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown 

so to be important for the catalytk: activity. At the C-terminal of the Y-box. there is aC2 domain (see <PEXXXX)380>) 
possibly involved in Ca-dependent membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X-box domain occur also in prokaryotic and 
trypanosome Pl-specific phosphollpases C. Apart from this region, the prokaryotic enzymes show no similarity to their 
eukaryotc counterparts. 
ss Two profiles were developed, one covering the X-box, the other the Y-box. 

[ 1J Meldrum E.. Parker PJ., Carozzi A Bkxhim. Biophys. Acta 1092:49-71(1 991 ).[ 2] Rhee S.G.. Choi K.D. Adv. 
Second Messenger Phosphoprotein Res. 26:35-61(1992). 
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( 3] Rhee S.G., Choi K.D. J. Biol. Chem. 267:12393-12396(1992). 

[ 4] Stemweis RC, Smrcka A.V. Trends Biochenn. Sci. 17:502-506(1992). 

[1149] 413. (PI-PLC-Y) PhosphatidylinositGl-specrfic phosphollpase C profiles 
5 Phosphatidylinositol-specific phosphollpase C (EC 3.1.4.11), an eukaryotic intracellular enzyme, plays an important 
role in signal transduction processes [1 ]. It catalyzes the hydrolysis of 1 -phosphatidy l-D-myo-inositol-3,4.5-triphosphate 
Into the second messenger molecules diacylglycerol and inositol-1,4,5-triphosphato. This catalytic process is tightly 
regulated by reversible phosphorylation and binding of regulatory proteins [2 to 4]. 

In mammals, there are at least 6 different isoforms of PI-PLC, they differ in their domain structure, their regulation, and 

10 their tissue distribution. Lower eukaryotes also possess multiple isoforms of PI-PLC. 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as 'X-box' and T-box'. The order of 
these two regions is always the same (NH2-X-Y-CCX)H). but the spacing Is variable. In most isofonns, the distance 
between these two regions is only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, and 
one SH3 domain are inserted between the two PLC-specific domains. The two consented regions have been shown 

IS to be important for the catalytic activity. At the C-terminal of the Y-box, there is a C2 domain (see <PDCXX)0380>) 
possibly involved in Ca-dependent membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X-box domain occur also in prokaryotic and 
trypanosome Pl-specific phospholrpases C. Apart from this regkxt. the prokaryotic enzynnes show no similarity to their 
eukaryotk: counterparts. 
20 Two profiles were developed, one covering the X-box. the other the Y-box. 

[ 1) Meldrum E., Parker RJ., Carozzi A. Bkxjhim. Bfophys. Acta 1092:49-71(1 991 ).[ 2] Rhee S.G.. Choi K.D. Adv. 
Second Messenger Phosphoprotein Res. 26:35-61(1992). 
[ 3) Rhee S.G.. Choi K.D. J. Bk>l. Chem. 267:12393-12396(1992). 
2S [ 4] Stemweis RC. Smrcka A.V. Trends Bkx:hem. Sci. 17:502-506(1992). 

[11 50] 414. (PK) Pyruvate kinase active site signature 

Pymvate kinase (EC 2 7.1.40) (PK) [1] catalyzes the final step in glycolysis, the conversion of phosphoenolpyruvate 
to pyruvate with the concomitant phosphorylation of ADP to ATR PK requires both magnesium and potassium ons for 
30 Its activity. PK is found In all living organisms. In vertebrates there are four, tissues specific, isozymes: L (liver), R (red 
cells), Ml (muscle, heart, and brain), and M2 (early fetal tissues). In Escherichia coli there are two isozymes: PKA 
(gene pykF) and PK-II (gene pykA). All PK isozymes seem to be tetramers of kJentcal subunits of about 500 amino 
acki reskiues. 

As a signature pattem for PK a consen/ed regkxi was selected that includes a lysine residue whk:h seems to be the 
3S acid/base catalyst responsible for the interconverston of pyruvate and enoipyruvate, and a glutamb acid reskiue Im- 
plicated in the binding of the magnesium ion. 

[1151] Consensus pattem: [UVAC]-x-[UVM](2)-(SAPCV]-K-[LIVhE-[NKRST]-x-[DEQHSHGSTAJ-[LIVM] [K is the 

active site reskiue] [E is a magnesium iigand] 

[1152] [ 1] Muirhead H. Bkx:hem. Soc. Trans. 18:193-196(1990). 
40 [1153] 415. (PLDc) Phosphollpase D. Active site motif 

Phosphatidyteholine-hydrolyzing phosphollpase D (PLD) isoforms are activated by ADP-ribosylatk)n factors (ARFs). 

PLD produces phosphatidic acid from phosphatkJyteholine, which may be essential for the formatkyi of certain types 

of transport vesk:les or may be constitutive vescular transport to signal transduction pathways. 

PC-hydrotyzing PUD is a honnotogue of cardk)lipin synthase, phosphatidylserine synthase, bacterial PLDs, and viral 
45 proteins. 

Each of these appears to possess a domain duplk^tbn which is apparent by the presence of two motifs containing 
well-consen/ed histidine, lysine, and/or asparagine residues which may contribute to the active site, aspartk: acid. An 
E. coli endonuclease (nuc) and similar proteins appear to be PLD homobgues but possess only one of these motifs. 
The profile contained here represents only the putative active site regkyis, since an accurate multiple alignment of the 
so repeat units has r)tii been achieved. 
Number of members: 139 
[1] 

Medline: 96303814 

A novel family of phosphollpase D homok)gues that includes phospholipkJ synthases and putative endonucleases: 
ss kjentificatkyi of duplrcated repeats and potential active site reskJues. 
Ponting CP, Ken ID; 

Prc^ein Sci 1996;5:914-922. 
[2]Medtine: 96334293 
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A duplicated catalytic motif in a new superfamily of phosphohydrolases and phospholipid synthases that includes pox- 
virus envelope proteins. 
Koonin EV; 

Trends Blochem Scl 1996;21 :242-243. 
5 (3]Medline: 94327597 

Cloning and expression ot phosphatidylcholine-hydrolyzing phospholipase D from Ricinus communis L. 
Wang X, Xu L, Zheng L; 
J Bbl Chem 1994;269:20312-20317. 
[4]Medtine: 97386825 

10 Regulation of eukaryotic phosphatidylinosrtol^ecific phospholipase C and phospholipase D. 
Singer WD. Brown HA, Sternweis PC; 
Annu Rev Bkxhem 1997;66:475-509. 
[1154] 416. (PMI typel) Phosphomannose isomerase type I signatures 

Phosphomannose isomerase (EC 5.3.1.8) (PMI) [1,2] is the enzyme that catalyzes the interconversion of mannose- 
is 6-phosphate and f ructose-6-phosphate. In eukaryotes, it is Involved in the synthesis of GDP-mannose which is a con- 
stituent of N- and Olinked glycans as well as GPI anchors. In prokaryotes. it is involved in a variety of pathways 
Including capsular polysaccharide biosynthesis arui D-mannose metabolism. 

Three classes of PMI have been defined on the basis of sequence similarities [1 ]. The first class comprises all known 
eukaryotic PMI as well as the enzyme encoded by the nr>anA gene in enterobacteria such as Escherichia coli. Class I 
20 PMI's are proteins of about 42 to 50 Kd which bind a zinc ion essential for their activity 

As signature patterns for class I PMI, two consented regkins were selected. The first one is kx:ated in the N4erminal 
sectbn of these proteins, the second in the C4erminal half. Both patterns contain a resklue involved [3] in the binding 
of the zinc ion. 

[11 55] Consensus pattern: Y-x-D-x-N-H-K-P-E [E is a zinc ligand] 

25 

- Consensus pattem: H-A-Y^LIVM]-x-G-x(2)-[U VM]-E-x-M-A-x-S-D-N-x-[LIVM]-R-A-G-x-T-P-K [H is a zinc ligand] 

[ 1] Proudfoot A.E.I., Turcatti G.. Wells TN.C. Paylon M.A.. Smith D.J. Eur. J. Bkjchem. 219:415-423(1994). 
[ 2] Coulin R, Magnenat E., Proudfoot A.E.I., Payton M.A., Scully P, Wells T.N.C. Biochemistry 32:14139-14144 
30 (1993). 

[ 3] Cleasby A.. Wonacott A.. Skarzynski T. Hubbard R.E., Davies G.J.. Proudfoot A.E.I., Bernard A.R., Payton 
M,A.. Wells TN.C. Nat. Struct. Bk>l. 3:470-479(1996). 

[1156] 417. (PNP UDP 1) Purine and other phosphorylases family 1 signature 
3S The fdtowing phosphorylases bebngs to the same family: 

- Purine nucleoskJe phosphorylase (EC 2.4.2.1 ) (PNP) from most bacteria (gene deoD). This enzyme catalyzes the 
cleavage of guanosine or inosine to respective bases and sugar-1 -phosphate molecules [1 ]. 

- Uridine phosphorylase (EC 2.4.2.3) (UdRPase) from bacteria (gene udp) and mammals. Catalyzes the cleavage 
40 of uridine into uracil and ribose-1 -phosphate. The products of the reaction are used either as carbon and energy 

sources or in the rescue of pyrimidine bases for nucleotide synthesis [2]. 

- 5'-methyllhwadenosTO phosphorylase (EC 2.4.2.28) (MTA phosphorylase) from Sulfobbus solfatarfcus [3]. 

As a signature pattem, a consen/ed regkx) was selected in the central part of these enzymes. 
4S [1157] Consensus pattem: [GST]-x-G-[UVMl-G-x-{PAJ-S-x-[GSTA]-l-x(3)-E-L 

- Note: it shoudi be noted that mammalian and some bacterial PNP as well as eukaryotk: MTA phosphorylase bek>ng 
to a different family of phosphorylases (see <PDOC00954>). 

so [1] Takehara M.. Ling R, Izawa S., Inoue Y, Kimura A Biosci. Bk>technol. Biochem. 59:1987-1990(1995). 

[ 2] Watanabe S.-l.. Hino A. Wada K., Eliason J.F., Uchida T J. Bbl. Chem, 270:12191-12196(1995). 
[ 3] Cacdapuoli G.. Porcelli M., Bertokto C. De Rosa M., Zappia V. J. Bol. Chem. 269:24762-24769(1994). 

[1158] 418. (PP2C) Protein phosphatase 2C signature 
ss Protein phosphatase 2C (PP2C) is one of the four major classes of mammalian serineAhreonine specific protein phos- 
phatases (EC 3. 1 .3.1 6). PP2C [1 J is a monomeric enzyme of about 42 Kd which shows broad substrate specificity and 
is dependent on divalent catbns (mainly manganese and magnesium) for its activity. Its exact physblogk:al role is still 
unclear. Three isozymes are currently known in mammals: PP2C-alpha, -beta and -gamma. In yeast, there are at least 
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four PP2C homologs: phosphatase PTC1 [2J which has weak tyrosine phosphatase activrty in addition to its activity 
on serines, phosphatases PTC2 and PTC3, and hypothetical protein YBR125c. Isozymes of PP2C are also known 
from Arabkiopsis thatiana (ABI1. PPH1), Caenorhabditls elegans (FEM-2, F42G9.1, T23F11.1), Leishmania chagasi 
and Paramecium tetraurelia. 

s In Arabidopsis thaliana, the kinase associated protein phosphatase (KAPP) [3] is an enzyme that dephosphorytates 
the Ser/Thr receptor-like kinase RLK5 and which contains a C-terminal PP2C domain. 

PP2C does not seem to be evolutbnary related to the main family of serine/ threonine phosphatases: PP1 , PP2A and 
PP2B . However, it is significantly similar to the catalytic subunit of pymvate dehydrogenase phosphatase (EC 3.1,3.43) 
(PDPC) [4], whwh catalyzes dephosphorylatk>n and concomitant reactivation of the alpha subunit of the El component 
10 of the pyruvate dehydrogenase complex. PDPC is a mitochondrial enzyme and, like PP2C, is magnesium-dependent. 
As a signature pattem, the best conserved region was selected which is kx^ated in the N-terminal part and contains a 
perfectly conserved tripeptkie. This region includes a consented aspartate reskJue involved in divalent cation binding 
15]- 

[1159] Consensus pattem: [LIVMFYHLIVMFYAHGSAC]-[LI VMHFYC]-D-G-H-[GAV] 

15 

- Hole: PP2C belongs [6] to a superfamiiy whk^h also includes bacterial proteins such as Bacillus spollE, rsbU and 
rsbW, Synechocystis PCC 6803 k:fG as well as a domain in fungal adenylate cyclases. 

[ 1J Wenk J., Trompeter H.-L. Pettrich K.-G., Cohen P.T.W., Campbell D.G., Mieskes G. FEBS Lett. 297:135-138 
20 (1992). 

[ 2] Maeda T, Tsai A.YM., Saito H. Mol. Cell. Bk>l. 13:5408-5417(1993). 
[ 3] Stone J.M., Collinge M.A., Smith R.D.. Horn M.A., Walker J.C. Science 266:793-795(1994). 
[ 4] Lawson J.E., Niu X.-D., Browning K.S.. Trong H.L. Yan J., Reed LJ. Bkx:hemistry 32:8987-8993(1993). 
[ 5] Das A.K., Helps N.R., Cohen P.TW.. Barford D. EMBO J. 24:6798-6809(1996). 
2S [ 6] Bork R, Brown N R.. Hegyi H„ Schullz J. Protein Sci. 5:1421-1425(1996). 

[1160] 41 9. (PPTA) Piotein prenyltransferases alpha subunit repeat signature 

Protein prenyltransferases catalyze the transfer of an Isoprenyl moiety to a cysteine four residues from the C-terminus 
of several proteins. They are heterodimerrc enzymes consisting of alpha and beta subunits. The alpha subunit is thought 
30 to participate in a stable complex with the isoprenyl substrate; the beta subunit binds the peptide substrate. Distinct 
protein prenyltransferases might share a common alpha subunit. Both the alpha and beta subunit show repetitive 
sequence motifs [1 ]. These repeats have distinct structural and f unctkmat lmplk:atlons and are unrelated to each other. 
Known protein prenyltransferase alpha subunits are: 



Mammalian protein farnesyltransferase alpha subunit. 

Yeast protein RAM2. a protein farnesyltransferase alpha subunit. 

Yeast protein BET4, a protein geranylgeranyttransferase alpha subunit. 



The consented domain of the alpha subunit consists of about 34 amino acids and is repeated five times. It contains 
40 an invariant tryptophan possibly involved in heterodimerizatbn with the consented phenylalanines in the repeated 
domains of the beta subunits, via hydrophobe bonds. The signature pattem for this domain is centered on the invariant 
tryptophan. 

[1161] Consensus pattem: IPSIAVl-x-lNDFVJ-INEQIYhx-ILIVMAGP]-W-[NQSTHF]-[FYHQ]-[LIVMR] 
[1162] [ 1] Boguski M.S., Murray A.W.. Powers S. New Biol. 4:408-411(1992). 

45 [1 1 63] 420. (PR55) Protein phosphatase 2A regulatory subunit PR55 signatures 

Protein phosphatase 2A (PP2A) is a serine/threonine phosphatase involved in many aspects of cellular function in- 
cluding the regulatbn of metabolk: enzymes and proteins involved In signal transduction. PP2A is a trimeric enzyme 
that consists of a core composed of a catalytic subunit associated with a 65 Kd regulatory subunit (PR65), also called 
subunit A; this complex then associates with a third variable subunit (subunit B), which confers distinct properties to 

so the hotoenzyme [1]. One of the forms of the variable subunit is a 55 Kd protein (PR55) which is highly consen/ed in 
mammals - where three isofomns are known to exist -. Drosophila and yeast (gene CDC55). This subunit coukJ perform 
a substrate recognitwn functkxi or be responsible for targeting the enzyme complex to the appropriate subcellular 
compartment. 

As signature patterns, two perfectly conserved sequences of 15 resklues were selected; one \ocaXe6 in the N4erminal 
ss region, the other in the center of the protein. 

[1164] Consensus pattem: E-F-D-Y-L-K-S-L-E-l-E-E-K-l-N 

Consensus pattem: N-[AG]-H-[TA]-Y-H-I-N-S-I-S-ILIVM]-N-S-D 

[1165] [ 1] Mayer-Jaekel R.. Hcmmings B.A. Trends Cell Biol. 4:287-291(1994). 
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[1166] 421 . N-(5'phosphorft>osyt)anthranilate (PRA) isomerase 

[1] Wilmanns M. Priestte JR Niermann X Jansonius JN; 

J Mol Bid 1992;223:477-507. 

[11671 422. (PRK) Phosphortoulokinase signature 

5 Phosphoribulokinase (EC 2.7. 1.19) (PRK) [1 ,2] is one of the enzymes specific to the Calvin's reductive pentose phos- 
phate cycle which is the major route by which carbon dioxide is assimilated and reduced by autotrophic organisms. 
PRK catalyzes the ATP-dependent phosphorylation of ribubse 5-phosphate into ributose 1 ,5-bisphosphate which is 
the substrate lor RubisCO. PRK's of diverse origins show different properties with respect to the size of the protein, 
the subunit stmcture, or the enzymatic regulation. However an alignment of the sequences of PRK from plants, algae, 

10 photosynthetic and chenrxautotrophic bacteria shows that there are a few regions of sequence similarity As a signature 
pattern one of these regions was selected. 
[1 1 68] Consensus pattern: K-(U VM]-x-R-D-x(3)-R^-x4ST]-x-E 

[ 1] Kossmann J.. Klintworth R., Bowien B. Gene 85:247-252(1989). 
IS 1 2] Gibson J.L, Chen J.-H., Tower RA., Tabila RR. Biochemistry 29:8085-8093(1990). 

[1169] 423. (PRPP synt) Phosphoribosyl pyrophosphate synthetase signature 

Phosphoribosyl pyrophosphate synthetase (EC 2.7.6.1 ) (PRPP synthetase) catalyzes the formation of PRPP from ATP 
and ribose 5-phosphate. PRPP is then used in various biosynthetic pathways, as for example in the formation of purines, 
20 pyrimidines, histidine and tryptophan. PRPP synthetase requires inorganic phosphate and nragnesium ions for its 
stability and activity. 

In mammals, three isozymes of PRPP synthetase are found; In yeast there are at least four isozymes. 
As a signature pattern for this enzyme, a very conserved region was selected that has been suggested to be involved 
in binding divalent cations [IJ. This region contains two consented aspartic acid residues as well as a histidine, which 
2S are all potential ligands for a cation such as magnesium. 

[1 1 70] Consensus pattern: D-ILIl-H-[SA]-x-CKIMST]-[QM]-G-[FY]-F-x(2)-P-[LI VMFCJ-D 

[1171] [ 1] Bower S.G.. Harlow K.W.. Switzer R.L, Hoven-Jensen B. J. Biol. Chem. 264:10287-10291(1989). 

[1 172] 424. (PRTP) Herpesvirus processing and transport protein 

The members of this family are associate with capsid intermediates during packaging of the virus. 
30 Number of members: 31 
[1] 

Medline: 98362148 

Herpes simplex virus type 1 cleavage and packaging proteins 
UL15 and UL28 are associated with B but not C capsids during 
3S packaging. Yu D, Weller SK; 

J Virol 1998;72:7428-7439. 
[1173] 425. Photosystem I psaG / psaK (PSI PSAK) proteins signature 

Photosystem I (PSI) [1] is an integral membrane protein complex that uses light energy to mediate electron transfer 
from plastocyanin to ferredoxin. It is found in the chbroplasts of plants and cyanobacteria. PSI is conrtposed of at least 
40 1 4 different subunits, two which PSI-G (gene psaG) and PSI-K (gene psaK) are small hydrophobic proteins of about 
7 to 9 Kd and evolutionary related [2]. Both seem to contain two transmembrane regions. Cyanobacteria seem to 
encode only for PSI-K. 

[1174] As a signature pattern, the best-conserved regkxi was selected which seems to correspond to the second 
transmembrane region. 

4S 

- Consensus pattern: [GT|-F-x-[LIVMpx-{DEA]-x(2)-[GA]-x-[GTA]-[SA]-x-G-H-x-[LIVMl-[GA] 
[1] Golbeck J.H. Biochim. Biophys. Acta 895:167-204(1987). 

[2] Kjaerulff S.. Andersen B., Nielsen VS., Moller B.L., Okkels J.S. J. Biol. Chem. 268:18912-18916(1993). 

so 

[1175] 426. PTR2 family proton/oligopeptkJe symporters signatures 

A family of eukaryotk; and prokaryotc proteins that seem to be mainly involved in the intake of small peptkJes with the 
concomitant uptake of a proton has been recently characterized [1 .2]. Proteins that belong to this family are: - Fungal 
peptkie transporter PTR2. 

55 

- Mammalian intestine proton-dependent oligopeptide transporter PeptTI . 
Mammalian kidney proton-dependent oligopeptide transporter PeptT2. 

' Drosophila optl. 
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- Arabidopsis thaliana peptide transporters PTR2-A and PTR2-B (also known as the histidine transporting protein 
NTR1). 

- Arabidopsis thaliana proton-dependent nitrate/chlorate transporter CHL1 . 

- Lactococcus proton-dependent di- and tri-peptide transporter dlpT 
5 - Caenorhabditis elegans hypothetical protein C06G8.2. 

- Caenorhabditis elegans hypothetical protein F56F4.5. 

- Caenorhabditis elegans hypothetical protein K04E7.2. 
Escherichia coli hypothetical protein ybgH. 
Escherichia coli hypothetical protein ydgR. 

10 - Escherichia coll hypothetical protein yhiP. 

Escherichia coli hypothetical protein yjdL. 

- Bacillus suk^lis hypothetical protein yclF 

These integral membrane proteins are predicted to comprise twelve transmembrane regions. As signature patterns, 
IS two of the best consented regions were selected. The first is a region that includes the end of the second transmembrane 
region, a cytoplasmic loop as well as the third transmembrane region. The second pattern corresponds to the core of 
the fifth transmembrane region. 

- Consensus pattern: IGAHGAS]4LI\^FYWA]-[LI VM]-[GASpD-x4LI VMFYMT]4U\^FYW]-G-^ 
20 [GSTAV)-x-ILIVMF]-x(3HGA] 

- Consensus pattern: IFYThx(2HLMFYp[FYVh[LIVMFYWA]-x-(IVG]-N-[LI VMAG]-G-(GSAHLIMF] 

[ 1] Paulsen I.T., Skurray R.A. Trends Biochem. Sci. 19:404-404(1994). 
[ 2] Steiner H.-Y. Naider F, Becker J.M. Mol. Microbiol. 16:82S834(1995). 

2S 

[1176] 427. Pumilkj-family RNA binding domains (aka PUM-HD. Pumilto homotagy domain) 
Puf domains are necessary and sufTictent for sequence specific 

RNA binding in fly Pumilio and worm FBF-1 and FBF-2. Both proteins function as transitional repressors in early 
embryonk: development by binding sequences in the 3* UTR of target mRN As (e.g. the nanos response element (NRE) 
30 In fly Hunchback mRNA, or the point mutation element (PME) in worm fem-3 mRNA). Other proteins that contain Puf 
domains are also plausible RNA binding proteins. JSN1_YEAST, for instance, appears to also contain a single RRM 
domain by HMM analysis. 

Puf doma^s usually occur as a tandem repeat of 8 domains. 

The Pfam model does not necessarily recognize all 8 domains in all sequences; some sequences appear to have 5 
3S or 6 domains on initial analysis, but further analysis suggests the presence of additk>nal divergent domains. 
[1177] [1] Zhang B, Gallegos M, Puoti A. Durtiin E. Fields S, Kimble J, 

Wk:kens MP Nature 1997;390:477-484. [2] Zamore PD, Williamson JR. Lehmann R. RNA 1997;3:1421-1433. 
[11781 428. PWWP domain. The PWWP domain is named after a consented Pro-Trp-Trp-Pro motif. The function of 
the domain is currently unknown. Number of members: 19 
40 [1179] [1] Medline: 98282232, WHSC1 , a 90 kb SET domain-containing gene, expressed In early development and 
homotogous to a Drosophila dysmorphy gene maps In the Wolf-Hirschhorn syndrome critk:al region and is fused to 
IgH in t(4;14) multiple myekxna. Stec I, Wright TJ, van Ommen GJB. de Boer PAJ. van Haeringen A, Moorman AFM, 
Attherr MR. den Ounnen JT; Hum Mol Genet 1998;7:1071-1082. 
[Iiapi 429. PX domain 

45 Eukaryotk: domain of unknown functton present in phox proteins, PLD isoforms. a PI3K isoform. 
Number of nrtemt)ers: 71 
111 

Medline: 97084820 

Novel domains in NADPH oxkiase subunits. sorting nexins. and 
so Rdlns 3-kinases: bindhrtg partners d SH3 domains? 
Ponting CP; 
Prc^eln Sci 1996;5:2353-2357. 
[1181] 430. ParA family ATPase 
11] 

ss Medline: 91141297 

A family of ATPases involved in active partitkxiing of diverse bacterial plasmids. 
Motallebi-Veshareh M, Rouch DA, Thomas CM; 
Mol Microbiol 1990;4:1455-1463. 
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Number of members: 122 

[1182] 431 . (Parvo coat) Parvovirus coat protein. 72 members. 
[1183] 432. Pectinesterase signatures 

Pectinesterase (EC 3.1.1 .11) (pectin mett>ylesterase) catalyzes ttie hydrolysis ot pectin into pectate and methanol. In 
s plants, it plays an important role in cell wall metabolism during fruit r^jening. In plant bacterial pathogens such as 
Enwinia carotcvora and in fungal pathogens such as Aspergillusniger, pectinesterase is involved in maceration and 
soft-rotting of plant tissue. 

Prokaryotic and eukaryotic pectinesterases share a tew regions of sequence similarity [1,2,3]. two of these regions 
were selected as signature patterns. 
10 The first is based on a region in the N-terminal section of these enzymes; it contains a conserved tyrosine which irwy 
play a role in the catalytic mechanism [3]. The second pattern corresponds to the best conserved region, an octapeptide 
located in the central part of these enzymes. 

- Consensus pattern: [GSTNP]-x(6)4FYN^RHIVN]4KEP]-x-G4STI\^RQ]-Y4DNQKRMV14EPJ-x(3)-ILIMVA] 
IS - Consensus pattern: [IVl-x-G-[STADHUVTJ-D-{FYIHIVHFSNl-G 

[ 1] Ray J., Knapp J.. Grierson D., BirdC, Schuch W. Eur. J. Biochem. 174:119-124(1988). 

[ 2J Plastow G.S. Mol. Microbiol. 2:247-254(1988). 

[ 3] MarkovicO.. Joernvall H. Protein Sci. 1:1288-1292(1992). 

20 

[1184] 433. Pentapeptide repeats (8 copies) 

These repeats are found in many cyanobacterial proteins. 

The repeats were first identified in hglK (1). The function of these repeats is unknown. 
The structure of this repeat has been predicted to be a beta-helix [2]. 
2S The repeat can be approximately described as Ap/N)LXX. where X can be any amino ackJ.Number of members- 75 
[1] 

Medline: 96062225 

The hglK gene is required for kx:alizatbn of heterocyst-specific glycoliptds In the cyanobacterium 
Anabaena sp. strain PCC 7120. 
30 Black K, Buikema WJ, Haselkom R; 

J Bacteriol 1995;177:6440-6448. 
[2]Medline: 98318059 
Structure and distributk)n of pentapeptide repeats In bacteria. 
Bateman A, Murzin A, Teichrr^nn SA; 
3S Protein Sci 1998;7:1477-1480. 

[3]MedIine: 98316713 

Characterisation of an Arabidopsis cDNA encoding a thylakoki lumen protein related to a novel 'pentapeptide repeat* 
family of proteins. 
Kieselbach T, Mant A. Robinson C, Schroder WP; 

FEBS Lett 1 998;428:241 -244. 
[1 1 85] 434. Polypeptide deformylase 

[1] 

Medline: 97002011 

A new subclass of the zinc metalloproteases superfamily revealed by the solutk)n structure of peptkie deformylase. 
4S Meinnel T, Blanquet S. Dardel F; 

J Mol Bk>l 1996;262:375-386. 
[2]Medline: 98332750 
Solutkyi structure of nk:kel-peptlde deformylase. 
Dardel F. Ragusa S. l^ennec C, Blanquet S, Meinnel T; 
so J Mol Biol 1 998;280:501 -51 3. 

Number of members: 21 

[1 1 86] 435. PeptkJyI-tRNA hydrolase signatures 

PeptidyMRNA hydrolase (EC 31.1.29) (PTH) is a bacterial enzyme that cleaves peptidyHRNA or N-acyl-aminoacyl- 
tRNA to yiek) free peptkles or Nnacyl-amino acids and tRN A. The natural substrate for this enzyme may be peptidyl- 
ss tRNA whk:h drop off the ribosome during protein synthesis [1 ,2]. Bacterial PTH has been found [2.3] to be evolutionary 
related to yeast hypothetical protein YHR189w. 

PTH and YHR1 89w are proteins of about 200 amino acid residues. As signature pattems, two consented regions were 
selected that each contain an histkjine. The first of these regfons Is kx:ated In the N-tenninal sectk)n. the other in the 
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central part. 

- Consensus pattern: [FY}-x{2)-T-R-H-N-x-G-x(2)-[UVMFA](2HDE] 

- Consensus pattern; [GS]-x(3)-H4sl-G-[UVM]4KRHDNSHLIVMT| 

5 

( 1] Garcia-Villegas M.R., De La Vega F.M., Galindo J.M., Segura M.. Buckingham R.H.. Guarneros G. EMBO J. 
10:3549-3555(1991). 

[ 2] De Vega F.M., Galindo J.M., Old I.G., Guarneros G. Gene 169:97-100(1996). 
[ 3] Ouzounis C. Bork P., Casari G., Sander C. Protein Sci. 4:2424-2428(1995). 

10 

[1 1 87] 436. (Peptidase Ml 7) Cytosol aminopeptkiase signature 

Cytosot aminopeptidase is a eukaryotk: cytosollc zinc-dependent exopeptklase that catalyzes the removal of unsub- 
stituted amino-acd residues from the N-terminus of proteins. This enzyme is often known as leucine aminopeptkJase 
(EC 3.4.11.1) (LAP) but has been shown [1] to be identk:al with prolyl aminopeptidase (EC 3.4.11.5). Cytosol ami- 
75 nopeptkJase is a hexamer of kientical chains, each of which binds two zinc kx)s. 

Cytosol aminopeptkiase is highly similar to Escherk:hta coli pepA, a nrianganese dependent aminopeptidase. ReskJues 
involved in zinc kxi-binding [2] in the nnammalian enzyme are absolutely consented in pepA where they presumably 
bind manganese. 

A cytosol aminopeptidase from Rrckettsia prowazekii [3] and one from Arabktopsis thaliana also belong to this family 
20 As a signature pattem for these enzymes, a perfectly consented octapeptkJe was selected whk:h contains two resklues 
involved in binding metal ions: an aspartate and a glutamate. 

- Consensus pattem: N-T-D-A-E-G-R-L [The D and the E are zinc/manganese ligands] 

- Note: these proteins betong to family M17 in the classifrcatk>n of peptklases [4,E1]. 

2S 

1 1] Matsushima M.. Takahashi T., Ichinose M.. Miki K., Kurokawa K., Takahashi K. Bk)chem. Bk)phys. Res. Com- 
mun. 178:1459-1464(1991). 

( 2] Burley S.K.. DavW P.R, Sweet R.M.. Taytor A., Upscomb W.N. J, Mol. Bk)l. 224:113-140(1992). 
[ 3] Wood D.O., Solomon M.J.. Speed R.R. J. Bacteriol. 175:159-165(1993). 
^ [ 4} Rawlings N.D., Barrett A. J. Meth. Enzymol. 248:183-228(1 995). 

[1188] 437. Assemblin (Peptkiase family S21) 
111 

Medline: 96399137 
35 Three-dimensional structure of human cytonriegalovirus protease. 

Shieh HS, Kummbail RG Stevens AM, Stegeman RA, Stunman EJ, 
Pak JY, Wittwer AJ, Palmier MO. Wiegand RC. Holwerda BC. 
Stallings WC; 
Nature 1996;383:279-282. 
40 Number of members: 29 

[11 89] 438. Pollen proteins Ole e I family signature 

The folkywing plant pollen proteins, whose bk>k>gical functk)n is not yet known, are structurally related [1]: 

- Olive tree pollen major allergen (Ole e I). 

45 - Tomato anther-specific protein LAr52. - Maize pollen-specifk: protein ZmC13. These proteins are most probably 
secreted and conset of about 145 reskJues. As shown in the following schematic representation, there are six 
cysteines which are consented in the sequence of these proteins. They seem to be involved in disulfide bonds. 



xxxxxxCxCxxxJCCxxxxCxxxxxxxxxxxxxxxxxOtM 

conserved cysteine involved in a disulfide bond, 
position of the pattem. 

Consensus pattem: [EQJ-G-x-V-Y-C-D-T-C-R [The two C*s are probably involved In disulfide bonds] 
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[1190] [ 1] Villalba M., Batanero E., Lopez-Olin C, Sanchez LM., Monsalve R.I., Gonzalez De La Pena M.A., Lahoz 
C. Rodriguez R Eur J. Biochem. 216:863^69(1993). 
[1191] 439. Pollen allergen 

This family contains allergens lol PI. Pli and Pill fronn Lolium perenne. 
s Number of members: 49 
[1] 

Medline: 90105394 

Complete primary structure of a Lolium perenne (perennial rye grass) pollen allergen. Lol p III: comparison with known 
Lol p I and II sequences. 
10 Ansari AA, Shenbagamurthi R Marsh DG; 
Biochemistry 1 989;28:8665-8670. 
[1192] 440. Porphobilinogen deaminase cofactor-binding site 

Porphobilinogen deaminase (EC 4.3.1 .8), or hydroxymethylbilane synthase, is an enzyme involved in the biosynthesis 
of porphyrins and related macrocycles. It catalyzes the assembly of four porphobilinogen (PBG) units in a head to tail 
IS fashion to form hydroxymethylbilane. 

The enzyme covalently binds a dipyrromethane cofactor to which the PBG subunits are added in a stepwise fashion. 
In the Escherichia coli enzyme (gene hemC), this cofactor has been shown [1] to be bound by the sulfur atom of a 
cysteine. The region around this cysteine is consented in porphobilinogen deaminases from various prokaryotic and 
eukaryotic sources. 

20 

- Consensus pattern: E-R-x-[UVMFAI-x(3)-[LI VMn-x-G-[GSA]-C-x4IVT|-P-[U VMF] 
-[GSA] [C is the cofactor attachment site] 

[1193] [ 1] Miller A.D., Hart G.J., Packman LC, Battersby A.R. Biochem. J. 254:915-918(1988). 
25 [1194] 441. Preseniirn 

Mutations In presenilin-1 are a major cause of early onset Alzheimer's disease [2]. It has been found that presenilin-1 
(Swiss:P49768) binds to beta-catenin in vivo [4]. This family also contains SPE proteins from C.elegans. 
Number of members: 23 

ni 

30 Medline: 98045995 

Presenilins and Alzheimer's disease. 
Kim TW, Tanzi RE; 

Curr Opin Neurobiol 1997;7:683-688. 
[2]Medline: 98045995 
3S p resen iiins and Alzheimer's disease. 
Kim TW. Tanzi RE; 

Curr Opin Neurobiol 1997;7:683-688. 
[3]Medline: 98099802 
Interaction ol presenilins with the filamin temily of actin-binding proteins. 
40 Zhang W, Han SW, McKeel DW, Goate A, Wu JY; 
J Neurosci 1998;18:914-922. 
[4]Medline: 99004850 

Destabilisation of beta-catenin by mutatkxns in presenilin-1 potentiates neuronal apoptosis. 

Zhang Z. Hartmann H, Do VM, Abramowski D, Sturchler-Pierrat 
45 c, Staufenbiel M, Sommer B, van de Wetering M. Clevers H. 
Saftig P, De Strooper B, He X, Yankner BA; 

Nature 1998;395:698-702. 
[1 1 95] 442. (Pribosyltran) Purine/pyrimkline phosphoribosyl transferases signature 

Phosphortoosyltransferases (PRTT) are enzymes that catalyze the synthesis of beta-n-5*-monophosphates from phos- 
so phoribosylpyrophosphate (PRPP) and an enzyme specific amine. A number of PRTs are involved in the biosynthesis 
of purine, pyrinrtidine. arKJ pyridine nucleotides, or in the salvage of purines and pyrimidines. These enzymes are: 

- Adenine phosphoribosyltransferase (EC 2.4.2.7) (APRT). whk:h is involved in purine salvage. 

- Hypoxanthine-guanine or hypoxanthine phosphoribosyltransferase (EC 2.4.2.8) (HGPRT or HPRT), whteh are 
55 involved in purine salvage. 

- Orolate phosphoribosyltransferase (EC 2.4.2.10) (OPRT), which is involved in pyrlmidine biosynthesis. 

- Amido phosphoribosyllransferase (EC 2.4.2.14), which is involved in purine biosynthesis. 

- Xanthine-guanine phosphoribosyltransferase (EC 2.4.2.22) (XGPRT). which is involved in purine salvage. 
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In the sequence o! all these enzymes there is a small conserved region which may be invotved in the enzymatic activity 
and/or be part of the PRPP binding site [IJ. 

- Consensus pattem: [UVMFYWCTAHU\^]^LIVMAHLIVMFCHDE]-D-[LIVMS]4UVMHSTAN^^ 
x-ISTAR] 

Note: in position 11 of the pattem most of these enzymes have Gly. 

[11961 [1] Hershey H.V., Taylor M.W. Gene 43:287-293(1986). 
[1197] 443. (Pro CA) 

Prokaryotic-type cartxxilc anhydrases signatures 

Carbonic anhydrases (EC 4.2.1.1) (CA) are zinc metalbenzymes which catalyze the reversible hydration of carbon 
dtcxtde. In Escherichia coli, CA (gene cynT) is involved in recycling carbon dioxide formed in the bicarbonate-dependent 
decomposition of cyanate by cyanase (gene cynS). By this action, it prevents the depletion of cellular bicarbonate [1]. 
In photosynthetic bacteria and plant chloroplast. CA is essential to inorganic carbon fixation [2]. Prokaryotic and plant 
chloroplast CA are structurally and evolutionary related and form a family distinct from the one which groups the many 
different forms of eukaryotic CA's (see <PDOC00146>). Hypothetical proteins yadF from Escherichia coli and HI1301 
from Haemophilus influenzae also belong to this family. Two signature patterns were developed for this family of en- 
zymes. Both patterns contain conserved residues that could be involved in binding zinc (cysteine arKi histidine). 

- Consensus pattem: C-(SA}-D-S-R-{U VM}-x-[AP] 

- Consensus pattem: [EQ]-Y-A4LIVM]-x(2)-pJVM]-x(4)-[LI VMF](3)-x-G-H-x(2)-C-G 

[ 1] Guilloton M.B., Korte J.J., Lamblin A.F., Fuchs J.A„ Anderson PM. J. Biol. Chem. 267:3731-3734(1992). 
[ 2] Fukuzawa H., Suzuki E., Komukai Y. MIyachi S. Proc. Natl. Acad. Sci. U.S.A. 89:4437-4441 (1992). 

[1198] 444. (ProlyLollgopep) 

Prolyl oligopeptkiase family serine active site 

[1 1 99] The prolyl oligopeptidase family [1 ,2, 3] consist of a number of evolutionary related peptidases whose catalytic 
activity seems to be provkied by a charge relay system similar to that of the trypsin family of serine proteases, but 
whk:h evolved by independent convergent evolutk)n. The known members of this family are listed betow. 

- Prolyl endopeptidase (EC 3.4.21 ,26) (PE) (also called post-proline cleaving enzyme). PE Is an enzyme that cleaves 
peptkie bonds on the C-terminal skie off prolyl residues. The sequence of PE has been obtained from a mammalian 
species (pig) and from bacteria (Flavobaclerium meningosepticum and Aeromonas hydrophila); there is a high 
degree of sequence conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21 .83) (oligopeptkiase B) (gene prtB) whk:h cleaves peptide bonds on the C- 
terminal skJe of lysyl and argininyl resklues. 

- DipeptkJyl peptidase IV (EC 3.4.14.5) (DPP IV). DPP IV is an enzynr» that removes N-terminal dipeptldes sequen- 
tially from polypeptkles having unsubstituted N-termini provkied that the penultimate reskiue is proline. 

- Yeast vacuolar dipeptdyl aminopeptklase A (DPAP A) (gene: STE1 3) whfch is responsible for the proteolylk: mat- 
uratk)n of the alpha-factor precursor. 

- Yeast vacuolar dtpeptklyl aminopeptklase B (DPAP B) (gene: DAP2). 

- Acylamino-ackJ-releasing enzyme (EC 3.4. 19. 1 ) (acyl-peptide hydrolase). This enzyme catalyzes the hydrolysis 
of the amino-termlnal peptkie bond of an N-acetylated protein to generate a N-acetylated amino acki and a protein 
with a free amino-terminus. 

[1 200] A consented serine residue has experimentally been shown (in E.coli proteasell as well as In pig and bacterial 
PE) to be necessary for the catalytk: mechanism. This serine. whk;h is part of the catalytic triad (Sen His, Asp), is 
generally kxaXed about 1 50 reskiues away from the C-tenminal extremity of these enzymes (which are all proteins that 
contains about 700 to 800 amino ackis). 

[1 201] Consensus pattem: D-x(3)-A-x(3)-[Ll VMFYWl-x(1 4)-G-x-S-x-G-G-(LI VMFYW](2) [S is the active site reskiue] 
Sequences known to bebng to this class detected by the pattem ALL. except for yeast DPAP A. 
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[1202] Note: these proteins belong to families S9A/S9B/S9C in the classification of peptidases [4]. 

[ 1] Rawtings N.D., Polgar L, Barrett AJ. Biochem. J. 279:907-911(1991). 
[ 2] Ban-ett A.J.. Rawllngs N O. 
5 [3]Pol^rL,Szal)oE. 

1 4] Rawllngs N.D.. Barrett A. J. Meth. Enzymol. 244:19-61 (1994), 

[1203] 445. (Rerin 4a) 
Pterin 4 alpha carbinolamine dehydratase 
10 [1204] Rerin 4 alpha carbinolamone dehydratase is aka DCoH (dimerisation cofactor of hepatocyte nuclear factor 
1 -alpha). 

[1205] Number of members: 11 

[1206] (11 Cronk JD, Endrizzi JA, Alber T; Medline: 97052967 High-resolution stmctures of the bifunctlonal enzyme 
and transcriptional coactivator DCoH and its complex with a product analogue.' Protein Sci 1996;5:1963-1972. 
IS [1207] 446. (Pyridox oxidase) 

Pyridoxamine 5'-phosphate oxidase signature 

[1208] Pyridoxamine 5'-phosphate oxklase (EC 1 .4.3.5) is a FMN flavoprolein involved In the de novo synthesis of 
pyrktoxine (vitamin B6) and pyridoxal phosphate. It oxidizes pyridoxamine^-P (PMP) and pyrkjoxlne-5-P (PNP) to 
pyridoxal-5-P. The sequences of the enzyme from bacterial (genes pdxH or f prA) [1 ] and fungal (gene PDX3) [2] sources 
20 show that this protein has been highly conserved throughout evolution. 

PdxH is evolutbnary related [3] to one of the enzymes in the phenazine bk>synthesis protein pathway, phzD (also 
known as phzG). As a signature pattern, a highly consented region was selected kx^ated in the C^emiinal part of these 
enzymes. 

2S - Consensus pattern: [LI Vf=]-E-F-W-[QHG]-x(4)-R4LIVM]-H-[DNE]-R 

[ 1] Lam H.-M.. Winkler M E. J. Bactertol. 174:6033^045(1992). 

[ 2] Loubbardi A.. Karst R, Gulltoton M.. f^rcireau C. J. Bacteriol. 177:1817-1823(1995). 

[ 3] Pierson LS. Ill, Gaffney T, Um S., Gong F. FEMS Mfcrobfol. Lett. 134:299-307(1995). 

30 

[1209] 447. (Pyrophosphatase) 
Inorganic pyrophosphatase signature 

[1210] Inorgank: pyrophosphatase (EC 3.6.1.1) (PPase) [1,2] is the enzyme responsible for the hydrolysis of pyro- 
phosphate (PPi) which is formed principally as the product of the many btosynthetic reactions that utilize ATP. All known 

36 Ppases require the presence of divalent metal cations, with nragnesium conferring the highest activity Among other 
residues, a lysine has been postulated to be part or close to the active site. PPases have been sequenced from bacteria 
such as Escherichia coll (homohexamer), thermophilb bacteria PS-3 and Thermus thermophilus. from the archaebac- 
teria Thermoplasma acktophilum. from fungi (homodhmer), from a plant, and from bovine retina In yeast, a mitochon- 
drial isoform of PPase has been characterized which seems to be Invotved in energy production and whose activity is 

40 stimulated by uncouplers of ATP synthesis. 

[1211] The sequences of PPases share some regk)ns of similarities. As signature patterns a region was selected 
that contains three consented aspartates that are involved In the binding of catbns. 

- Consensus pattern: D-[SGDN]-D-[PE]-(UVMFJ-D-[LIVMGAC] 

45 

[The three 0*5 bind divalent metal cations] 

( 1) Lahti R., Kolakowski LF. Jr., Heinonen J., Vihinen M.. Pohjanoksa K., Coopemian B.S. Biochim. Biophys. Acta 
1038:338-345(1990). 

so 1 2] Cooperman B.S.,'Baykov A.A.. Lahti R Trends Bkx:hem. Sci. 17:262-266(1992). 

[1212] 448. (Peptidase S26) 
Signal peptklases I signatures. 

[1213] Signal peptidases (SPases) [1] (aka leader peptklases) remove the signal peptkJes from secretory proteins. 
ss In prokaryotes three types of SPasesare known: type I (gene lepB) which is responsible for the processing of the 
majority of exported pre-protelns; type II (gene Isp) which only process lipoproteins, and a third type involved in the 
processing of pili subunils. SPase I (EC 3.4.21.89) is an Integral membrane protein that is anchored in the cytoplasmk: 
membrane by one (in B, subtilis) or two (in E. coli) N4erminal transmembrane domains with the main part of the protein 
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protuding in the periplasmic space. Two residues have been shown [2.3] to be essential for the catalytic activity of 
SPase I: a serine and an lysine. SPase I is evolutionary related to the yeast mitochondrial inner membrane protease 
subunit 1 and 2 (genes IMP1 and IMP2) which catalyze the removal of signal peptides required for the targeting of 
proteins from the mitochorKlrial matrix, across the irtner membrane, into the inter-membrane space [4].ln eukaryotes 

5 the removal of signal peptides is effected by an oligomeric enzymatic complex composed of at least five subunits: the 
signal peptidase complex (SPC). The SPC is located in the endoplasmic reticulum membrane. Two components of 
mammalian SPC. the 16 Kd (SPC18) and the 21 Kd (SPC21) subunits as well as the yeast SEC11 subunit have been 
shown [5] to share regions of sequence similarity with prokaryotic SPases I and yeast IMP1/II^P2. Three signature 
patterns have been developed for these proteins. The first signature contains the putative active site serine, the second 

10 signature contains the putative active site lysine which is nc^ consented in the SPC subunits, and the third signature 
corresponds to a consented region of unknown biotogical significance whk:h is located in the C-terminal section of all 
these proteins. 

[1214J Consensus pattern: [GS]-x-S-M-x-{PS]-{ATHLF| [S is an active site residue]- 

Consensus pattern: K-R4LIVMSTA](2)-G-x-[PG]-G4DE]-x-(UVM]-x-[LIVMFY] [K is an active site residue]- 

is Consensus pattern: [LIVMFYW](2)-x(2)-G-D-INH>x(3HSND]-x(2)-[SG]- 

[1215] [ 1] Dalbey R.E., von Heijne G. Trends Bk)chem. Sci. 17:474-478(1 992).l 2] Sung M., Dalbey R.E. J. Biol. 
Chem. 267:13154-13159(1992).[ 3] Black M.T J. Bacteriol. 175:4957-4961 (1993).[ 4] Nunnari J.. Fox TD., Walter P 
Science 262:1 997-2004(1 993).( 5] van DijI J.M.. de Jong A.. Vehmaanpera J., Venema G.. Bron S. EMBO J. 11: 
2819-2828(1992).[61 Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61 (1994).[E1] 

20 [1216] 449. (Peptidase CI ) Eukaryotic thiol (cysteine) proteases active sites. Eukaryotic thfol proteases (EC 3.4.22.-) 
[1] are a family of proteolytic enzymes whk:h contain an active site cysteine. Catalysis proceeds through a thbester 
intermediate and is facilitated by a neart>y histkline side chain; an asparagine completes the essential catalytic triad. 
The proteases which are currently known to bekxig to this family are listed betow (references are only provkJed for 
recently determined sequences). - Vertebrate lysosomal cathepsins B (EC 3.4.22.1 ). H (EC 3.4.22.16 ). L (EC 

2S 3.4.22.15) . and S (EC 2A.22 27) [2]. - Vertebrate lysosomal dipeptklyl peptidase I (EC 3.4.14.1 ) (also known as cathe- 
psin C) [2]. - Vertebrate calpains (EC 3.4.22.17) . Calpains are intracellular calcium- activated thiol protease that contain 
both a N-temibial catalytic domain and a C-terminal cak:ium-binding domain. - Mammalian cathepsin K, which seems 
involved in osteoclastic bone resorptbn [3]. - Human cathepsin O [4]. - Bleomycin hydrolase. An enzyme that catalyzes 
the inactivatfon of the antitumor drug BLM (a glycopeptkJe). - Plant enzymes: barley aleurain (EC 3.4.22. 16 ). EP-B1/B4; 

30 kidney bean EP-C1 , rice bean SH-EP; kiwi fruit actinkJin (EC 3.4.22.14 ); papaya latex papain (EC 3.4.22.2) . chymo- 
papain (EC 34.22.6) . carkain (EC 3.4.22.30) . and proteinase IV (EC 3.4.22.25 ): pea turgor-responsive protein 15A; 
pineapple stem bromelain (EC 3.4.22.32 ): rape COT44; rk:e oryzain alpha, beta, and gamma; tomato low-temperature 
induced. Arabkiopsis thaliana A494. RD19A and RD21 A. - House-dust mites allergens DerPI and Eurm . - Cathepsin 
B-like proteinases from the worms Caenorhabditis elegans (genes gcp-1 , cpr-3, cpr-4, cpr-5 and cpr-6). Schistosoma 

35 mansoni (antigen SM31 ) and Japonica (antigen SJ31 ), HaenrKXichus contortus (genes AC-1 and AC-2), and Ostertagia 
ostertagi (CP-1 and CP-3). - Slime mokJ cysteine proteinases CP1 and CP2. - Cruzipain from Trypanosoma cruzi and 
brucei. - Throphozoite cysteine proteinase (TCP) from various Plasmodium species. - Proteases from Leishmania 
mexfcana, Theileria annulata and Theileria pawa. - Bacuksviruses cathepsin-like enzyme (v-cath). - Drosophila small 
optic bbes protein (gene sol), a neuronal protein that contains a calpain-like domain. - Yeast thk)l protease 

40 BLH1/YCP1/LAP3. - Caenorhabditis elegans hypothetcal protein C06G4.2. a calpaln-IIke protein. Two bacterial pepti- 
dases are also part of this family: - Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. - Thk)l protease tpr 
from Porphyromonas gingivalis. Three other proteins are structurally related to this family, but may have tost their 
proteolytic activity. - Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. - Rat 
testin, a Sertoli ceil secretory protein highly similar to cathepsin L but with the active site cysteine is replaced by a 

45 serine. Rat testin shouW not be confused with mouse testin which is a LIM-domaIn protein (see <PDOC00382 >). - 
Plasmodium falciparum serine-repeat protein (SERA), the major bkxxJ stage antigen. This protein of 111 Kd possesses 
a C-terminal thiol-protease-like domain [6], but the active site cysteine is replaced by a serine. The sequences around 
the three active site reskJues are well consented and can be used as signature patterns. 

[1217] Consensus pattern: 0-x(3)-{GE]-x-C-{YWl-x(2)-[STAGC]-(STAGCV] [C is the active site residue]- Note: the 
so residue in posltkxi 4 of the pattem is almost always cysteine; the only exceptions are calpains (Leu), bleonrtycin hy- 
drolase (Ser) and yeast YCP1 (Ser). -Note: the reskJue In position 5 of the pattem is always Gly except in papaya 
protease IV where It is Glu. 

[1218] Consensus pattem: [LIVMGSTAN]-x-H-[6SACE]-[UVM]-x-[LIVMAT](2)-G-x-[GSADNH] [H is the active site 
residue]- 

55 Consensus pattem: [FYCH]-[WI]-[LI VT]-x-[KRQAG]-N4ST]-W-x(3)-[FYW]-G-x(2)-G-[LFYW]-[LI VMF^ [N 
is the active site residue] - Note: these proteins betong to family CI (papaln-type) and C2 (calpains) in the classification 
of peptkiases r7.E11.- 

[1219] [ 1] Dufour E. Btochimie 70: 1335-1 342(1 988).( 2] Kirschke H., Barrett A.J.. Rawlings N.D. Protein Prof. 2: 
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1587'1643(1995).[3] Shi G.-R, Chapman H.A., Bhairi S.M., Deleeuw C, Reddy VY, Weiss SJ. FEBS Lett. 357: 
129-1 34(1 995).[ 4] Velasco G.. Ferrancto A.A., Puente XS,, Sanchez LM,. LopezOtin C. J. Biol. Chem. 269: 
271 36-271 42(1 994).( 5J Chapot-Chartier M.R, NardI M., Chopin M.C., Chopin A., Gripon J.C. Appl. Environ. Microbiol. 
59:330-333(1 993).[ 6] Higgins D.G.. McConnell D.J.. Sharp P.M. Nature 340:604-604(1 989).[ 7] Rawlings N.D., Barrett 
5 A. J. Meth. Enzymol. 244:461-486(1994). 

[1220] 450. (peptidase M24) Amlnopeptldase P and proline dipeptidase signature (1 ). 

Anrtinopeptldase P (EC 3A11.9 > is the enzyme responsible for the release of any N-termrnal amino acid adjacent to a 
proline residue. Proline dlpeptldase(EC 3.4.13.9) (prolidase) splits dipeptldes with a prolyl residue in the carboxyl 
temninal position. Bacterial amlnopeptldase P II (gene pepP) [1], proline dipeptidase (gene pepQX2], and human proline 
10 dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metalloenzymes. Yeast hypo- 
thetical proteins YER078c and YFFJ006w and Mycobacterium tuberculosis hypothetical protein MtCY49.29c also be- 
long to this family. As a signature pattern for these enzymes a conserved region that contains three histidine residues 
has been developed 

[1221] Consensus pattern: IHA]-{GSYRHLIVMTHSG]-H-x-{LIV]-G-[LIVM]-x-(IV]-H-[DE]- 
is [1222] [ 1] Yoshimoto T. Tone H., Honda T. Osatomi K.. Kobayashi P., Tsuru D. J. Blochem. 105:412-416(1989). 
[ 2] NakahlgashI K.. Inokuchi H. Nucleic Acids Res. 18:6439-6439(1 990).[ 3] Endo F.. Tanoue A.. Nakai H.. Hata A.. 
Indo Y, Trtani K., Matsuda I. J. Biol. Chem. 264:4476-4481 (1989).[ 4] Rawlings N.D.. Barrett A.J. Meth. Enzymol 248- 
183-228(1995). 

[1 223] Methionine amlnopeptldase signatures. (2). Methionine amlnopeptldase (EC 3.4. 11 .18 ) (MAP) Is responsible 
20 tor the removal of the amino-temnlnal (initiator) methionine from nascent eukaryotic cytosolic and cytoplasmic prokary- 
ollc proteins if the penultimate amino ackJ Is small and uncharged. All MAP studied to date are monomerk: proteins 
that require cobalt ions tor activity Two subfamilies of MAP enzymes are known to exist [1 ,2]. While being evolutonary 
related, they only share a limited amount of sequence similarity mostly clustered around the reskJues shown, in the 
Escherichia coll MAP [3],to be involved in cobalt-binding. The first family consists of enzymes from prokaryotes as well 
2S as eukaryotrcMAP-l . while the second group is made up of archebacterial MAP and eukaryotk:MAP-2. The second 
subfamily also Includes proteins wh«h do not seem to be MAP, but that are clearly evolutionary related such as mouse 
proliferation-associated protein 1 and fissbn yeast curved DNA-binding protein. For each of these subfamilies, a spe- 
cify signature pattern that includes residues known to be nnvolved In colball-blnding has been devetoped. 
[12241 Consensus pattern: [MFY]-x-G-H-G-[LIVMC]-IGSH]-x(3)-H-x(4)-[LIVM]-x-[HN]-[YWV] [H Is a cobalt llgand]- 
30 Consensus pattern: [DA]-(U VMY]-x-K4LIVM]-D-x-G-x-[HQHLI VM]-[DNS]-G-x(3)-(DN] [The second D and the last 0/ 
N are cobalt ligands] 

[1225] [ 1] Arfin S.M., Kendall R.L, Hall L. Weaver LH., Stewart A.E., Matthews B.W., Bradshaw R.A. Proc. Natl. 
Acad. Sci. U.S.A. 92:771 4-771 8(1 995).[ 2] Keeling PJ., Doolittle WF. Trends Blochem. Sci. 21:285-286(1 996). [3] 
Roderick S.L, Mathews B.W. Bfochemislry 32:3907-391 2(1 993).[ 4] Rawlings N.D., Barrett A.J. Meth. Enzymol 248* 
35 183-228(1995). 

[1226] 451 . Cytochrome P450 cysteine heme-iron llgand signature 

Cytochrome P450's (1 ,2,3£1] are a group of enzymes involved in the oxidative metabolism of a high number of natural 
compounds (such as steroids, fatty ackte. prostaglandins, leukotrienes, etc) as welt as drugs, carcinogens and muta- 
gens. Based on sequence simibrilies. P45as have been classified into about forty different families [4.5]. P450*s are 
proteins of 400 to 530 amino ackte; the only exccpton is Bacillus BM-3 (CYP102) which Is a protein of 1048reskJues 
that contains a N-terminal P450 domain followed by a reductase domain. P450's are heme proteins. A conserved 
cysteine residue in the C-terminal part of P450*s is involved in binding the heme iron in the fifth coordinatran site. From 
a regkxi around this residue, a ten reskiue signature was devek)ped specific to P450's. 
[1227] Consensus pattern: [FW|-(SGNH]-x-[GD]-x-[RHPT|-x-C-[LIVMFAPJ-[GAD] [C is the heme iron ligand]- 

45 

[ 1] Nebert D.W., Gonzalez RJ. Annu. Rev. Biochem. 56:945-993(1987). 

[ 2] Coon M.J., Ding X, Pernecky S.J., Nfez A.D.N. FASEB J. 6:669-673(1992). 

I 3] Guengerbh RP J. Biol. Chem. 266:10019-10022(1991). 

[ 4] Nelson D.R., Kamataki T, Waxman D.J.. Guengerich RR, Estrabrook R.W., Feyereisen R., Gonzalez RJ.. 
so Coon M.J., Gunsalus I.C., Goloh O., Okuda K., Nebert D.W. DNA Cell Bfol. 12:1-51(1993). 

[ 5] Deglyarenko K.N.. Archakov A.I. FEBS Lett. 332:1^(1993). 

[1 228] 452. (Pec Lyase) Pectate lyase 

This enzyme forms a right handed beta helix structure. Pectate lyase is an enzyme Involved in the maceration and soft 
55 rotting of plant tissue. 

[1229] [1] Yoder MD, Keen NT, Jumak R Science 1993;260:1503-1507. 

[1230] 453. (pep M24) Aminopeptkiase P and proline dipeptidase signature (pepi ) 

Amlnopeptldase P (EC 3.4.11.9) is the enzyme responsible for the release of any N-terminal amino ackl adjacent to a 
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proline residue. Proline dipeptidase(EC 3.4.13.9) (prolidase) spirts dipeptides with a prolyl residue in the carboxyl 
terminal position. Bacterial aminopeptidase P II (gene pepP) [1 ]. proline dipeptidase (gene pepQX2]. and human proline 
dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metalloenzymes. Yeast hypo- 
thetical proteins YER078c and YFROOSw and Mycobacterium tuberculosis .hypothetical protein MtCY49.29c also be- 
5 long to this family. As a signature pattern for these enzymes a consen/ed region was selected that contains three 
histidine residues. 

[1 2311 Consensus pattern: [H AHGSYRHLIVI^-{SG]-H-x-{LI V]-G4LIVM]-x-[l VJ-H-fDE]- 

[ 1] YoshimotoT, Tone H., Honda T, Osatomi K., Kobayashi R, Tsuru D. J. Biochem, 105:412-416(1989). 
10 1 2] NakahlgashI K., Inokuchi H. Nucleic Acids Res. 18:6439-6439(1 990). 

[ 3] Endo R, Tanoue A., Nakai H., Hata A.. Indo Y, Titani K., Matsuda 1. J. Biol. Chem. 264:4476-4481(1989). 
[ 4] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[1232] Methionine aminopeptidase signatures (pep2) 

IS Methionine aminopeptidase (EC 3.4.11.18 ) (MAP) is responsible for the removal of the amino-temiinal (initiator) me- 
thionine from nascent eukaryotic cytosolic and cytoplasmic prokaryotic proteins if the penultimate amino acid is small 
and uncharged. All MAP studied to date are monomerk: proteins that require cobalt bns for activity Two subfamilies 
of MAP enzymes are known to exist [1,2]. While being evolutionary related, they only share a limited amount of se- 
quence similarity mostly clustered around the residues shown, in the Escherichia coli MAP [3],to be involved in cobatt- 

20 binding. The first family consists of enzymes from prokaryotes as well as eukaryotic MAP-1, while the second group 
is made up of archebacterial MAP artd eukaryotk: MAP-2. The second subfamily also includes proteins whbh do not 
seem to be MAR but that are clearly evolutksnary related such as mouse proliferation-associated protein 1 and flssk>n 
yeast cun/ed DNA-binding protein. For each of these subfamilies, a specifk: signature pattern was developed that 
includes resklues known to be involved in colbalt-binding. 

^5 [1233] Consensus pattern: [MFY]-x-G-H-G-{LIVMCHGSH]-x(3)-H-x(4)-{LI VM]-x-[HN]-[YWV) [H is a cobalt ligand]- 
Consensus pattem: [DAHLI VMYI-x-K-[LIVM]-D-x-G-x-[HQ}-(LI VM]-[DNS]-G-x(3)-IDN] [The second D and the last D/ 
N are cobalt ligands] 

[ 1] Arfin S.M.. Kendall R.L.. Hall L., Weaver LH., Stewart A.E.. Matthews B.W., Bradshaw R.A. Proc. Natl. Acad. 
30 Sci. U.S.A. 92:7714-7718(1995). 

[ 2] Keeling PJ., Doolittle WF. Trends Bkxjhem. Sci. 21:285-286(1996). 
[ 3] Roderick S.L, Mathews B.W. Bk)chemlstry 32:3907-3912(1993). 
[ 4] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

3S [1234] 454. Peroxidases signatures 

Peroxidases (EC 1 . 1 1 . 1 .-) [1] are heme-binding enzymes that carry out a variety of bk)6ynthetic and degradative f unc- 
tons using hydrogen peroxide as the electron acceptor. Peraxkiases are wkiely distributed throughout bacteria, fungi, 
plants, and vertebrates, in peroxidases the heme prosthetk: group is protoporphyrin IX and the fifth ligand of the heme 
Iron is a histkline (known as the proximal histkitne). Another histidine reskJue (the distal histkline) senses as an acid- 

40 base catalyst in the reactbn between hydrogen peroxkJe and the enzyme. The regions around these two active site 
residues are more or less conserved in a majority of peroxidases [2,3]. The enzymes in which one or both of these 
regkxis can be found are listed betow. - Yeast cytochrome c peroxidase (EC 1.11.1.5) . - Myeloperoxklase (EC 1.11.1.7) 
(MPO). MPO is found in granukx:ytes and nrxxiocyt^ and plays a major role in the oxygen-dependent mbrobbklai 
system of neutrophils, - Lactoperoxidase (EC 1.11.1.7) (LPO). LPO is a milk protein which acts as an antimicrobial 

45 agent. - Eosinophil peroxdase (EC 1.11.1.7) (EPO). An enzyme found in the cytoplasms granules of eosinophils. - 
Thyroid peroxidase (EC 1.11.1.8 ) (TPO). TPO plays a central role in the bbsynthesis of thyroid hormones. It catalyzes 
the kxlinatk)n and coupling of the honmonogenic tyrosines in thyrogtobulin to yield the thyroid hormones T3 and T4. - 
Fungal ligninases. Ligninase catalyzes the first step in the degradation of lignin. It depolymerizes lignin by catalyzing 
the C(alpha)-C(beta) cleavage of the propyl side chains of lignin. - Plant peroxkJases (EC 1.11. 1.7 ). Plants expresses 

so a large numbers of isozyirtes of peroxkiases. Some of them play a role in cell-suberization by catalyzing the deposition 
of the aronnatic reskJues of suberin on the cell wall, some are expressed as a defense response toward wounding, 
others are involved in the metabolism of auxin and the bk)synthesis of lignin. - Prokaryotic catalase-peroxkJases. Some 
bacterial species produce enzymes that exhibit both catalase and broad-spectrum peroxkJase activities [4]. Examples 
of such enzymes are: catalase HP I from Escherichia coll (gene katG) and perA from Bacillus stearothermophilus. 

ss [1235] Consensus pattem: [DET]-(UVMTA]-x(2)-(LIVMHLIVMSTAG]-[SAG]-[LIVMSTA6 ]-H- [STAJ-[LIVMFY] [H is 
the proximal heme-binding ligand] - 

Consensus pattem: [SGATV)-x(3)-[U VMA}-R-[LIVMAJ-x-[FW|-H-x-(SAC] (H is an active site residue]- 
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[ 1] Dawson J.H. Science 240:433-439(1988). 

[ 2] Kimura S., Ikeda-Saito M. Proteins 3:113*120(1988). 

[ 3] Henrissat B.. Satoheimo M., Lavaitte S., Knowles J.K.C. Proteins 8:251-257(1990). 
( 4) Welinder K.G. Biochim. Biophys. Acta 1080:215-220(1991). 

[1 236] 455. pf kB family of cartx)hydrate kinases signatures 

It has been shown [1,2,3] that the folk>wing carbohydrate and purine kinasesare evolutionary related and can be 
grouped into a single family, whfch Isknown [1] as the 'pfkB family*: - Fructokinase (EC 2.7.1.4 ) (gene scrK). - 6-phos- 
phofructoklnase isozyme 2 (EC 2.7.1.11) (phosphofructokinase-2) (gene pfkB). pfkB is a minor phosphofructokinase 
isozyme in Escherichia coll and is not evolutionary related to the major isozyme (gene pfkA). Plants 6-phosphofruc- 
tokinase also belong to this family. - Ribokinase (EC 2.7.1.15) (gene rbsK). - Adenosine kinase (EC 2.7.1.20 ) (gene 
ADK). - 2-dehydro-3<leoxygIuconokinase (EC 2.7.1. 45 ) (gene: kdgK). - 1 -phosphofmctokinase (EC 2.7. 1.56) (fructose 
1 -phosphate kinase) (gene fruK). - Inosine-guanoslne kinase (EC 2.7.1.73 ) (gene gsk). - Tagatose-6-phosphate kinase 
(EC 2.7.1.144) (phosphotagatokinase) (gene lacC). - Escherichia coli hypothetical protein yeiC. - Escherichia coli hy- 
pothetk:al protein yell. - Escherchia coli hypothetk»l protein yhfQ. - Escherichia coli hypothetfcal protein yihV - Bacillus 
subtilis hypothetical protein yxdC. - Yeast hypothetical protein YJR105w.AII the above kinases are proteins of from 280 
to 430 amino acki residues that share a few regnn of sequence similarity. Two of these regions were selected as 
signature patterns. The first pattem Is based on a region rich in glycine which is kxated In the N-temninal sectkxi of 
these enzymes; while the second pattem is based on a consented region In the C-temninal section. 
[1237] Consensus pattem: [AG]-G-x(0,1)-{GAP]-x-N-x-{STA]-x(6)-[GSl-x(9)-G- 
Consensus pattem: [DNSK]-(PSTyi.x-lSAG](2)-[GD]-D-x(3)-{SAGVJ-[AG]-[UVMFYAJ-ILIVMSTAP] 

[ 1] Wu L-R, Reizer A.. Reizer J., Cai B.. Tomich J.M.. Saler M.H. Jr. J. Bacteriol. 173:3117-3127(1991 ). 
( 2] Orchard LM.D., Komberg H.L Proc. R. Soc. Lond., B, Biol. Sci. 242:87-90(1990). 
[ 3] Blatch G.L, Scholle R.R., Wbods D.R. Gene 95:17-23(1990). 

[1238] 456. Phospholipase A2 active sites signatures 

Phospholipase A2 (EC 3.1.1.4 ) (PA2) (1 ,2] Is an enzyme whteh releases fatty ackte from the second carbon group of 
glycerol. PA2's are small and rIgkJ proteins of 120 amIno-ackJ reskJues that have four to seven disulfide bonds.PA2 
binds a cateium Ion which is required for activity. The side chains of two consen/ed residues, a histidine and an aspartic 
ackl. participate in a 'catalytic network*. Many PA2's have been sequenced from snakes, lizards, bees and mammals. 
In the latter, there are at least four forms: pancreatc, membrane-associated as well as two less characterized forms. 
The venom of most snakes contains multiple forms of PA2. Some of them are presynaptic neurotoxins which inhibit 
neuromuscular transmissfon by bkxjking acetyteholine release from the nerve termini. Two different signature pattems 
were derived for PA2's. The first is centered on the active site histkJine and contains three cysteines involved in disulfide 
bonds. The second is centered on the active site aspartic ackJ and also contains three cysteines involved in disulfide 
bonds. 

[1239] Consensus pattem: C-C-x(2)-H-x(2)-C [H is the active site residue] This pattem will not detect some snake 
toxins homokDgous with PA2 but which have lost their catalytk: activrly as well as otoconin-22, a Xenopus protein from 
the aragonitk: otoconia which is also unlikely to be enzymatically active. 

Consensus pattem: [LIVMAJ-C-{LIVMFYWPCST}-C-D-x(5).C [D is the active site residue] Ttie majority of functional 
and non-functional PA2's. Undetected sequences are bee PA2, gila monster PA2's, PA2 PL-X from habu and PA2 PA- 
5 from mulga. 

[ 1] Davidson F.F, Dennis E.A. J. Mol. Evol. 31:228-238(1990). 

[ 2] Gomez F, Vandemneers A., Vfeindermeers-Piret M.-C., Herzog R.. Rathe J.. Stievenart M., Winand J., Chris- 
tophe J. Eur. J. Bkx:hem. 186:23-33(1989). 

[1240] 457. Phosphorylase pyridoxal-phosphate attachment site. Phosphorylases (EC 2.4.1.1 ) [1] are important al- 
losteric enzymes in carbohydrate metabolism. They catalyze the formatran of glucose 1-phosphatefrom polyglucose 
such as glycogen, starch or maltodextrin. Enzymes from different sources differ In their regulatory mechanisms and 
their natural substrates. However, all known phosphorylases share catalytic and structural properties. They are pyri- 
doxal-phosphate dependent enzymes; the pyrkloxal-P group is attached to a lysine residue around whk:h the sequence 
is highly consented and can be used as a signature pattem to detect this class of enzymes. 
[1241] Consensus pattem: E-A-ISCl-G-x-(GS]-x-M-K-x(2)-[LM]-N [K is the pyridoxal-P attachment stle]- 
[ 1] Fukui T, Shimomura S., Nakano K. Mol. Cell. Biochem. 42:129-144(1982). 
[1242] 458. Protein kinases signatures and profile 

Eukaryotic protein kinases [1 to 5] are enzymes that bekxig to a very extensive family of proteins which share a con- 
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served catalytic core common toboth serine/threonine and tyrosine protein kinases. There are a number of conserved 
regions in the catalytic domain of protein kinases. Two of these regions were selected to bulM signature patterns. The 
first regbn. which is kx:ated in the N-terminal extremity of the catalytc domain, is a glycine-rich stretch of reskiues in 
the vksinity of a lysine residue, whfch has been shown to be involved in ATP binding. The second regwn, which is 
s kxated in the central part of the catalytk: donnain, contains a conserved aspartc acid residue whrch is important for 
the catalytK activity of the enzyme [6]; Two signature pattems were derived tor that region: one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was also devetoped whk;h is based on the alignment in 
[1 ] and covers the entire catalytic domain. 

[1243] Consensus pattern: ILIVhG-{P}-G-{PHFYWMGSTNH]-[SGA]-{PW}-iLIVCAT|-{PD}-x- [GSTACLIVMFY]-x 
10 (5.18HLIVf^FYWCSTAR]-[AiVPHLIVMFAGCKR]-K [K binds ATP]. The majority of known protein kinases belong to 
the class detected by this pattern, but it fails to find a number of them, especially viral kinases which are quite divergent 
In this region and are completely missed by this pattern. 

Consensus pattem: [LIVMFYChx-[HY]-x-D-[LIVMFY]-K-x(2)-N-[LI VMFYCT](3) [D is an active site residue]. Most ser- 
ine/ threonine specif k: protein kinases betong to this class detected by the pattem with 1 0 exceptions (half of them viral 

IS kinases) and also Epstein-Barr vims BGLF4 and Drosophila ninaC whbh have respectively Ser and Arg instead of the 
consented Lys and which are therefore detected by the tyrosine kinase specific pattem described below. 
[1244] Consensus pattem: [LI VMFYC]-x^HY>x-CHUVr^FY]4RSTAC]-x(2)-N4LIVMFYC](3) [D is an active site res- 
kJue] ALL tyrosine specific protein kinases with the exceptwn of human ERBB3 and mouse bik belong to this class 
detected by the pattem. This pattem will also detect most bacterial aminoglycoside phosphotransferases [8,9] and 

20 herpesvimses gangciclovir kinases [10]; whrch are proteins structurally and evoluttonary related to protein kinases. 
This profile also detects receptor guanylate cyclases and 2-5A-dependent ribonucleases. Sequence similarities be- 
tween these two families and the eukaryotk: protein kinase family have been notbed before. It also detects Arabidopsis 
thaliana kinase- like protein TMKL1 wh«h seems to have tost its catalytic activity. If a protein analyzed includes the 
two protein kinase signatures, the probability of it being a protein kinase is ctose to 100%. Eukaryotk:-type protein 

^ kinases have also been found in prokaryotes such as Myxococcus xanthus [11] and Yersinia pseudotubercutosis. 

( I] Hanks S.K., Hunter T FASEB J. 9:576-596(1995). 
[ 2] Hunter T Meth. Enzymol. 200:3-37(1991). 

[ 3] Hanks S.K.. Quinn A.M. Meth. Enzymol. 200:38-62(1991). 
30 1 4] Hanks S.K. Gun-. Opin. Struct. Biol. 1:369-383(1991). 

[ S\ Hanks S.K., Quinn A.M., Hunter T. Science 241:42-52(1988). 

[ 6] Knighton D R., Zheng J.. Ten Eyck LR. Ashford V.A.. Xuong N.-H., Taylor S.S., Sowadski J.M. Science 253: 
407-414(1991). 

[ 7] Bairoch A.. Claverle J.-M. hJature 331 :22(1988). 
35 [ 8] Benner S. Nature 329:21 -21 (1 987). 

[ 9] Kirby R. J. Mol. Evol. 30:489-492(1 992). 

[10] Littler E.. Stuart AD., Chee M.S. Nature 358:160-162(1992). 

[II] Munoz-Dorado J., Inouye S.. Inouye M. Cell 67:995-1 006f1 991 > . 

40 [1245] Receptor tyrosine kinase class II signature 

A number of growth factors stimulate mitogenesis by interacting with a famifyof cell surface receptors which possess 
an intrinsc, ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RrK)all sliare the 
same topology: an extracellular ligand-binding domain, a single transmembrane region and a cytoplasmic kinase do- 
main. However they can be classified into at least five groups. The prototype for class II RTTICs is the insulin receptor, 

4S a heterotetramer of two alpha and two beta chains linked by disulfide bonds. The alpha and beta chains are cleavage 
products of a precursor molecule. The alpha chain contains the ligand binding site, the beta chain transverses the 
membrane and contains the tyrosine protein khase domain. The receptors currently known to belong to class II are: 

- Insulin receptor from vertebrates. - Insulin growth factor I receptor from mammals. - Insulin receptor-related receptor 
(IRR), whrch is most probably a receptor for a peptkie betonging to the insulin family - Insects insulin-like receptors. - 

so Moiluscan insulin-related peptkje(s) receptor (MIP-R). - Insulin-like peptide receptor from Branchbstoma lanceolatum. 

- The Drosophila devetopmental protein sevenless, a putative receptor for positional information required for the for- 
mation of the R7 photoreceptor cells. - The trk family of receptors (NTRK 1 , NTRK2 and NTRK3), whch are high affinity 
receptors for nerve growth factor and related neurotrophic factors (BDNF and NT-3), And the following uncharacterized 
receptors: - ROS. - LTK (TYK1). - EDDR1 (cak. TRKE, RrrK6). - NTRK3 (TyrolO, TKT). - A sponge putative receptor 

ss tyrosine kinase. While only the insulin and the insulin growth factor I receptors are known to exist in the tetrameric 
conformatton specific to class II RTK's. all the above proteins share extensive homotogies in their kinase domain, 
especially around the putative site of autophosphorylatkxi. Hence, a signature pattem was devetoped for this class of 
RTK's. whk:h includes the tyrosine reskiue, ftself probably autophosphorylated. 
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[1 246] Consensus pattern: IDN]-{LI Vl-Y-x(3)- Y-Y-R [The second Y is the autophosphorybtlon site] 
[1247] [ 1] Yarden Y, Ullrich A. Annu. Rev. Biochem. 57:44^478(1988). 
[1248] Receptor tyrosine kinase class III signature 

A number ot growth factors stimulate mitogenesis by interacting with a family of cell surface receptors which possess 
5 an intrinsic, li^nd-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular ligand-biruiing domain, a single transmembrane region and a cytoplasmic kinase do- 
main. However they can be classified Into at least five groups. The class III RTK's are characterized by the presence 
of five to seven immunoglobulin-like domains [2] in their extracellular sectton. Their kinase domain differs from that of 
other RTK's by the Insertion of a stretch of 70 to 100 hydrophilic residues in the middle ofthe domain. The receptors 
10 currently known to belong to class III are: - Plateletnderived growth factor receptor (PDGF-R). PDGF-R exists as a 
homo- or heterodimer of two related chains: alpha and beta [3]. - Macrophage colony stimulating factor receptor (CSF- 
1-R) (also known as the fms oncogene). - Stem cell factor (mast cell growth factor) receptor (also known as the kit 
oncogene). - Vascular endothelial growrth factor (VEGF) receptors Flt-1 and Flk-1/KDR [4]. - Fl cytokine receptor Flk- 
2/Flt-3 [5]. - The putative receptor Flt-4 [7]. a signature pattern Was devetoped for this class of RTK's whch is based 
IS on a conserved region in the kinase domain. 

[1249] Consensus pattern: G-x-H-x-N-[LIVM]-V-^4-L-L-G-A-C-T- 

[ 1] Yarden Y, Ullrich A. Annu. Rev. Biochem. 57:443-478(1988). 
[ 2] Hunkapiller T, Hood L Adv. Immunol. 44:1-63(1989), 
20 [ 3] Lee K.-H.. Bowen-Pope D.R, Reed RR. Mol. Cell. Bbl. 10:2237-2246(1990). 

[4] Terman B.I., Dougher-Vemriazen M.. Canion M.E., Dimitrov D., Armellino D.C., Gospodarowk:z D., Boehlen 
P. Bkx:hem. Bbphys. Res. Commun. 187:1579-1586(1992). 

[ 5J Lyman S.D.. James L, V^nden Bos T. de Vries P.. Brasel K., Glinlak B.. Hollingsworth LT. Pfcha K.S., McKenna 
H.J., Splett R.R. Cell 75:1157-1167(1993) . 
2S [6] Galland F, Karamysheva A.. Pebusque M.J.. Borg J.P., Rottapel R.. Dubreuil P.. Rosnet O., BImbaum D. 

Oncogene 8:1233-1240(1993). 

[1 250] Receptor tyrosine kinase class V signatures 

A number of growth factors stimulate mitogenesis by interacting with a familyof celt surface receptors which possess 
30 an intrinsK. ligand-sensitive, protein tyrosine kinase activity [1]. These receptor tyrosine kinases (RTK)all share the 
same topology: an extracellular ligand-binding domain, a single transmembrane region and a cytoplasmid kinase do- 
main. However they can be classified Into at least five groups on the basis of sequence similarities. The extracellular 
domain of class V RTK's consist of a regwn of about 300amino ackJs, amongst which 1 6 consented cysteines probably 
involved in disulfkJe bonds; this regkjn is folbwed by two copies of a fibroneclin typelll domain. The ligands for these 
35 receptors are proteins of about 200 to 300reskiues collectively known as Ephrins. The receptors currently known to 
belong to class V are [2,3,Er|: - EPHAl (Eph-1; Esk). - EPHA2 (Eck; Mpk-5; Sek-2). - EPHA3 (Etk-I; Hek; Mek4; 
Tyro4; Rek4; Cek4). - EPHA4 (Sek; HekS; Mpk-3; Cek8). - EPHA5 (Ehk-1; Hek7; Bsk; Cek7). - EPHA6 (Ehk-2). - 
EPHA7 (Ehk-3: Hekll; Mdk-1; Ebk). - EPHA8 (Eek). - EPHB1 (Eph-2; Elk; Net). - EPHB2 (Eph-3; Hek5; Drt; Erk; Nuk; 
Sek-3; Cek5; QekS). - EPHB3 (Hek-2; Mdk-5). - EPHB4 (Htk; Mdk-2; Myk-1). - EPHB5 (Cek9).The EPHA subtype 
40 receptors bind to GPInanchored ephrins while the EPHB subtype receptors bind to type-l membrane ephrins. Two 
signature patterns were devekiped for this class of RTK's, which each include some of the conserved cysteine reskJues. 
[1251] Consensus pattern: F-x-{DN]-x-{GAW)-lGA]-C-[LIVM]-ISA]-(LIVM](2)-[SA]-[LVl-[KRHQHUVA]-x(3)-[KR]-C- 
[PSAW] [The two C's are probably involved in disulfide bonds] 

Consensus pattern: C-x(2)-[DEH3-[DEQhW-x(2,3)-IPAQHUVMTl.[GT]-x-C-x-C-x(2)-G-[HFY]-[EQ] [The three C's are 
45 probably involved in disulfide bonds] 

[ 1] Yarden Y, Ullrich A. Annu. Rev. Bkx^hem. 57:443-478(1988). 

[ 2] Sajjadi FG., Pasquale E.B., Subramani S. New Bk>\. 3:769-778(1991). 

[ 3] Wcks I.R, Wilkinson D., Salvaris E.. Boyd A.W. Proc. Natl. Acad. Sci. U.S.A. 89:1611-1615(1992). 

so 

[1252] 459. Protein kiriase C terminal domain 
[1 253] 460. Plant thionins signature 

Thionins are small, baste, plant proteins generally toxk; to animal celts [IJ.They seem to exert their toxic effect at the 
level of the cell membrane but their exact functkm is not known. They consist of a polypeptWe chain of forty five to fifty 
55 amino ackis with three to four internal disulfkte bonds. They are found in seeds but also in the cell wall of leaves [2]. 
Thionins are processed from larger precursor proteins [3]. Crambin [4], a hydrophobic plant seed protein, also betongs 
to this family. The pattem to d^ect this family of proteins includes three of the six cysteine residues involved in disulfide 
bonds, H i-h Kiitii 
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xxCCxxxxxxxxxxxCxxxxxxxxxCxxxCxxCxxxxxCxxxxxxxx '""IIIh h'C: conserved cysteine involved In 

a disulfide bond.'**: position of the pattern. 

[1254] Consensus pattern: C-C-x(5)-R-x(2)-[FY}-x(2H^ [The three C's are involved in disulfide bonds] The proteins 
from the gamma-thionin family are not related to the above proteins and are described in a separate sectbn. 

5 

[ 1) Vernon LP., Evett G.E., Zeikus R.D., Gray W.R. Arch. Blochem. Biophys. 238:18-29(1985). 

[ 2] Bohlmann H., Clausen S., Behnke S., Giese H.. Hiller C. Relmann-Philllp U., Schrader G., Barkholt V, Apel 

K. EMBO J. 7:1559-1565(1988). 

[ 3] Bohlmann H,. Apel K. Mol. Gen. Genet. 207:446-454(1987). 
10 1 4] Teeter M.M.. Mazer J. A.. UltaKen J.J. Bkx^hemistry 20:5437-5443(1 981 ). 

[1255] 461 . Polyprenyl synthetases signatures 

A variety of isoprenoid compounds are synthesized by various organisms. For example in eukaryotes the isoprenoid 
biosynthetic pathway is responsible for the synthesis of a variety of end products including cholesterol, doltehol, ubiq- 

is uinone or coenzyme Q. In bacteria this pathway leads to the synthesis of isopentenyl tRNA, isoprenoid quinones, and 
sugar carrier lipids. Among the enzymes that partteipate In that pathway, are a number of polyprenyl synthetase en- 
zymes whk:h catalyze a 1'4-condensatlon between 5 carbon isoprene units. Currently the sequence of some of these 
enzymes Is known: - Eukaiyoticfamesyl pyrophosphate synthetase (FPP synthetase) (EC 2.5.1.1 / EC 2.5.1.10) which 
catalyzes the sequential condensation of isopentenyl pyrophosphate (IPP) with dimethylallyl pyrophosphate (DMAPP), 

20 and then with the resultant geranyl pyrophosphate to form famesyl pyrophosphate. FPP synthetase is a cytoplasmic 
dimeric enzyme. - Prokaryotic famesyl pyrophosphate synthetase (gene ispA). - Prokaryotic octaprenyl diphosphate 
synthase (gene ispB). - Prokaryotic heptaprenyl diphosphate synthase (EC 2.5.1.30^ . - Eukaryolic geranylgeranyl py- 
rophosphate synthetase (GGPP synthetase) (EC 2.5.1.1 / EC 2.5.1.10 / EC 2.5.1.29 ) which catalyzes the sequential 
additkxi of the three molecules of IPP onto DMAPP to form geranylgeranyl pyrophosphate. In plants GGPP synthase 

2S is a chloroplast enzyme involved in the biosynthesis of terpenoids; in fungi, such as Neurospora crassa (gene al-3), 
this enzyme is involved in the biosynthesis of carotenokls. - Prokaryotic GGPP synthetase, which are involved in the 
biosynthesis of carolenokte (gene crlE). Such an enzyme Is also encoded In the cyanelle genome of Cyanophora 
paradoxa - Eukaryotic hexaprenyl pyrophosphate synthetase, whrch is involved in the bfosynthesis of coenzyme Q 
and which catalyzes the fonnnatksn of all trans- polyprenyl pyrophosphates generally ranging in length of between 6 

30 and 10 isoprene units depending on the species. HP synthetase is a mitochondrial membrane-associated enzyme. It 
has been shown [1 to 5] that all the above enzymes share some regions of sequence similarity. Two of these regk>ns 
are rich in aspartlc-ackJ residues and could be involved in the catalytic mechanism andfor the binding of the substrates, 
signature patterns were devek)pedfor both regions. Possible additional members of this family of proteins are: - Bacillus 
subtilis spore germination protein C3 (gene gerC3). Both proteins are nrK>st probably also enzymes Involved In Isopre- 

35 noid metabolism [6]. 

[1 256] Consensus pattem: [U VMl(2)-x-D-D-x(2,4)-D-x(4)-R-R-[GH]- 
Consensus pattem: [LIVMFY]-G-x(2)-[FYLI-CHUVM]-x-D-D-{U VMFYl-x-[DNG] 

[ 1] Ashby M.N., Edwards P.A. J. Biol. Chem. 265:13157-13164(1990). 
^ 1 2] Fujisaki S., Hara H., Nishimura Y, Horiuchi K., Nishino T. J. Bkxihem. 108:995-1000(1990). 

[ 3] Carattoli A., Romano N.. Ballario P.. Morelli G., Macino G. J. Biol. Chem. 266:5854-5859(1991). 
[ 4] Kuntz M., Roemer S., Suire C, Hugueney P, Well J.H., Schantz R., Camara B. Plant J. 2:25-34(1992). 
[ 5] Math S.K., Hearst J.E., Poulter CD. Proc. fMatl. Acad. Sci. U.S.A. 89:6761-6764(1992). 
[ 6] Bairoch A. Unpublished obsen^ions (1 993). 

45 

[1257] 462. Potato inhibitor I family signature 

The potato inhibitor I family is one of the numerous families of serine proteinase inhibitors. Members of this protein 
family are found In plants; In the seeds of bariey or beans [1 ,2.3], and in potato or tomato leaves where they accumulate 
In response to mechanical damage [4.5]. An inhibitor bekxiging to this family is also found In leech [6]. It is interesting 
so to note that, currently, this is the only proteinase inhibitor family to be found both inplant and animal kingdoms. Stmc- 
turalty these inhibitors are small (60 to 90 reskJues) and in contrast with other families of protease inhibitors, they lack 
disulfide bonds. They have a single Inhibitory site. The consensus pattem includes three out of the four reskiues con- 
sented in all members of this family and is located in the N-terminal half. 

Consensus pattem: (FYW)-P-(EQHHUVJ(2}-G-x(2)-[STAGV]-x(2)-A- Barley subtilisln^hymotrypsin lnhibltor-2b has 
55 Glu instead of Gly. There Is a trypsin inhibitor from the cucurbitaceae Monrwrdica charantia [7], which is said to betong 
to the potato Inhibitor I family but which shows only a very weak similarity with the other members of this family 

[ 1] Svendsen I., Hejgaard J., Chavan J.K. Carlsberg Res. Commun. 49:493-502(1984). 
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I 2] Svendsen I.. Boisen S.. Hejgaard J. Carlsberg Res. Commun. 47:45-53(1982). 

[ 3] Nozawa H.. Yamagata H., Aizono Y., Yoshikawa M.. Iwasaki T. J. Biochem. 106:1 003-1 CX»(1 989). 

[ 4] Cleveland T.E.. Thornburg R.W., Ryan C.A. Plant Mol. Biol. 8:199-207(1987). 

[ 5) Lee J.S.. Brown W.E., Grahanri J.S., Pearce G„ Fox E.A.. Dreher T.W.. Ahem K.G., Pearson G.D.. Ryan C.A. 
s Proc. Natl. Acad. Sci. U.S.A. 83:7277-7281(1986). 

[ 6] Seemuller U.. Eulitz M.. Fritz H.. StrobI A. Hoppe-Seyler's Z. Physiol. Chem. 361:1841-1846(1980). 
[ 7] Zeng F.-Y, Qian R.-Q.. Wang Y FEBS Lett. 234:35-38(1988). 

[1258] 463. (pp binding) Phosphopantetheine attachment site 

10 Phosphopantetheine (or pantetheine 4'phosphate) is the prosthetic group of acyl carrier proteins (ACP) in some mul- 
tienzyme complexes where it serves as a 'swinging anm' for the attachment of activated fatty acid and amino-acid 
groups [1 J. Phosphopantetheine is attached to a serine residue In these proteins [2]. ACP proteins or domains have 
been found in various enzyme systems which are listed below (references are only provided for recently determined 
sequences). - Fatty acid synthetase (FAS), which catalyzes the fonmation of long-chain fatty acids from acetyl-CoA. 

IS mabnyl-CoA and NADPH. Bacterial and plant chloroplast FAS are composed of eight separate subunits which corre- 
spond to the different enzymatic activities; ACP is one of these polypeptides. Fungal FAS consists of two multifunctional 
proteins, FAS1 and FAS2; the ACP domain is located in the N-terminal section of FAS2. Vertebrate FAS consists of a 
single multifunctional enzyme; the ACP domain is located between the beta-ketoacyl reductase domain and the C- 
terminal thioesterase domain [3], - Polyketide antibiotics synthase enzyme systems. Polyketides are secondary me- 

20 tabolrtes produced from sinrtple fatty ackJs. by mk;roorganisms and plants. ACP is one of the polypeptkJic components 
involved in the bwsynthesis of Streptomyces polyketide antibiotks actinorhodin, curamycin, granatacin, nrionensin. 
oxytetracycline and tetracenomycin C. - Bacillus subtilis putative polyketkJe synthases pksK, pksL and pksM which 
respectively contain three, five and one ACP domains. - The multifunctkxial 6-methysalicylic acid synthase (MSAS) 
from Penicillium patulum. This is a multifunctional enzyme involved in the bbsynthesis of a polyketide antibiotic and 

25 which contains an ACP domain in the C-terminal extremity. - Mullifunctk)nal mycocerosic ackJ synthase (gene mas) 
from Mycobacterium bovis. - Gramcidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme catalyzes the 
first step In the bk>synthesis of the cyclic antibiotc gramKidin S. - Tyrockiine synthetase I (gene tycA) from Bacillus 
brevis. The reaction carried out by tycA is kJentfcal to that catalyzed by grsA - GramickJin S synthetase II (gene grsB) 
from Bacillus brevis. This enzyme is a multifunctk>nal protein that activates and polymerizes proline, valine, ornithine 

30 and leucine. GrsB contains four ACP domains. - Erythronolkje synthase proteins 1 , 2 and 3 from Saccharopolyspora 
erythraea which is involved in the biosynthesis of the polyketide antibiotic erythromrcin. Each of these proteins contain 
two ACP domains. - Conidial green pigment synthase from Aspergillus nidulans. - ACV synthetase from various fungi. 
This enzyme catalyzes the first step in the bk>synthesis of penicillin and cephalosporin. It contains three ACP domains. 
- Enterobactin synthetase component F (gene entF) from Escherichia coli. This enzyme is involved in the ATP-depend- 

3S ent activatfon of serine during enterobactin (enterochelin) bk>synthesis. - Cyclic peptide antibiotic surfactin synthase 
subunits 1 , 2 and 3 from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only contains 
a single domain. - HC-toxin synthetase (gene HTS1 ) from Cochlobolus carbonum. This enzyme synthesizes HC-loxin, 
a cyclic tetrapeptide. HTS1 contains four ACP domains. - Fungal mitochondrial ACP [9], whfch is part of the respiratory 
chain NADH dehydrogenase (complex I). - Rhizobium nodulatkxi protein nodF, which probably acts as an ACP in the 

40 synthesis of the nodulatkwi fslod factor fatty acyl chain.The sequence around the phosphopantetheine attachment site 
is conserved in all these proteins and can be used as a signature pattern. A profile was also developed that spans the 
complete ACP-like domain. 

[12591 Consensus pattern: [DEQGSTALMKRHJ-[LIVMFYSTAC]-[GNQHLIVMFYAGHDNEKHS]-S- [LIVMST>{PC- 
FYHSTAGCPCM-IVMFh[LIVMATNHDENQGTAKRHLfi4]- [LIVMWSTA]-[LI VGSTACR]-x(2)-[LI VMFA] [S is the panteth- 
4S eine attachment site] 



[ 1] Concise Encyckjpedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New-York (1988). 
[ 2] Pugh E.L, W^l S.J. J. Biol. Chem. 240:4727-4733(1965). 

[ 3] Wrtkowski A., Rangan VS., Randhawa Z.I.. Amy CM., Smith S. Eur. J. Biochem. 198:571-579(1991). 
so 1 6] Scotti C, Piatti M.. Cuzzoni A, Perani R, Tognoni A., Grandi G.. Galizzi A., Albertini A.M. Gene 130:65-71 
(1993). 

[ 9] Sackmann U., Zensen R., Rohlen D.. Jahnke U., Weiss H. Eur. J. Bkx:hem. 200:463-469(1991). 
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[1 260] 464. (Prenyltrans) Terpene synthases signature 

The foltawing enzymes catalyze mechanistically related reactions whrch involvethe highly complex cyclic rearrange- 
ment of squalene or its 2.3 oxide: - Lanosterol synthase (EC 5.4.99.7 ) (oxidosqualene-lanosterol cyclase), which 
catalyzes the cyclization of (S)-2.3-epoxysqualene to lanosterol, the Initial precursor of cholesterol, steroid hormones 
and vitamin D in vertebrates and of ergosterol in fungi (gene ERG7). - Cyctoartenol synthase (EC 5.4.99.8^ (2.3-epox- 
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ysqualene-cycloartenol cyclase), a plant enzyme that catatyzes the cyclization ot (S)-2,3- epoxysqualene to cycioar- 
tenol. - Hopene synthase (EC 5.4.99.-) (squalene-hopene cyclase), a bacterial enzyme that catalyzes the cyclization 
of squalene Into hopene. a key step in hopanotd (triterpenoid) metalx)lism.These enzymes are evolutionary related [1] 
proteins of about 70 to 85 Kd. As a signature pattern, a highly conserved region was selected which is rich In aromatic 
s residues and which is located in the C-tenminal section. 

[1261] Consensus pattern: [DEI-G-S-W-x-G-x-W-{GAHUVMJ-x-(FY]-x-Y-[GA] 

[12621 1 1] Corey E.J.. Matsuda S.P.T.. Bartel B. Proc. Natl. Acad Sci. U.S.A. 90:11628-11632(1993). 

[1263] 465. Prion protein signatures 

Prion protein (PrP) [1,2,3] is a small glycoprotein found in high quantity In the brains of humans or animals infected 
10 with a number of degenerative neurological diseases such as Kuru, Creutzfeldt^acob disease (CJD), scrapie or bovine 
spongiform encephabpathy (BSE). PrP is encoded in the host genome and expressed both in normal and infected 
cells. It has a tendency to aggregate yielding polymers called rods. Structurally, PrP is a protein consisting of a signal 
peptide, followed by an N-terminal domain that contains tandem repeats of a short motif (PHGGGWGQIn mammals. 
PHNPGY in chicken), itself folbwed by a highly conserved ctomain lly comes a C-lerminal hydrophobic domain post- 
's translatbnalty removed when PrP is attachedto the extracellular side of the cell membrane by a GPI-anchor. The 

structureof PrP is shown in thefoltowingschematk: representatwn: ^ — i ^.♦***^ — 

+ISigl Tandem repeats I C C Sll h — • 1 j 1 — u + h + I GPrC: consented 

cysteine involved In a disuffide bond,**: position of the patterns. As signature pattem for PrP, a perfectly consen/ed 
alanine- and giycine-rk:h region of 16 residues was selected as well as a region centered on the second cysteine 
20 involved In the disulfide bond. 

[1264] Consensus pattem: A-G-A-A-A-A-G-A-V-V-G*G-L-G-G-Y- 

Consensus pattem: E-x-[EDhx-K-[UVM](2)-x-IKRHUVfW](2)-x4QE]-M-C-x(2)- Q-Y [C is involved in a disulfkJe bond] 

[ 1] Stahl N., Paisiner S.B. FASEB J. 5:2799-2807(1991). 
2S [ 2] Brunori M., Chiara Silvestrini M.. Pocchiari M. Trends Biochem. Sci. 13:309-313(1988). 

[ 3] Prusiner S.B. Annu. Rev Microbiol. 43:345-374(1989). 

[1265] 466. Cyctophilin-type peptkJyI-prolyl cis-traris isomerase signature and profile (pro Isomerase) 

Cyclophilin [1] is the major high-affinity binding protein in vertebrates for the Immunosuppressive drug cyctosporin A 

30 (CSA). It exhibits a peptidyl- prolyl cis-trans Isomerase activity (EC 5.2. 1.8 ) (PPIase or rotamase). PPIase is an enzyme 
that accelerates protein fokiing by catalyzing the cis-tiansisomerizatbn of proline Imidic peptide bonds in oligopeptkJes 
[2]. It is probable that CSA mediates some of its effects via an inhibitory action on PPIase. Cyclophilin is a cytosolk: 
protein whk:h betongs to a family [3,4,5]that also includes the following isozymes: - Cyclophilin B (or S-cyclophilin). a 
PPIase whteh is retained in an endoplasmic retfculum compartment. - Cyctophilin C, a cytoplasmic PPIase. - Mitochon- 

3S drial matrix cycbphilin (cyp3). - A PPIase whrch seems specific for the fokiing of rhodopsin and is an integral membrane 
protein anchored by a C-terminal transmembrane regk)n. This protein was first characterized in Drosophila (gene 
ninaA). - Bacterial periplasmic PPiase (gene ppiA). - Bacterial cytosolk: PPiase (gene ppiB). - Natural-killer cell cycto- 
philin-related protein. This large protein (about 1 60 Kd) is a component of a putative tumor-recognition complex involved 
in the f unctfon of NK cells. It contains a cyctophilin-type PPiase domain. - Mammalian nucleoporin Nup358 [6], a nuclear 

40 pore complex protein of 358 Kd that contains a C-terminal cyctophilin-type PPiase domain. - Yeast hypothetical protein 
YJR032W. - Fisston yeast hypothetical protein SpAC21E11.05c. - Caenorhabditis elegans hypothettoal protein 
T27D1 .1.The sequences of the different forms of cyclophllln-type PPIases are well consented. As a signature pattem, 
a conserved region was selected in the central part of these enzymes. 

[1266] Consensus pattem: [FYhx(2HSTCNLV|-x-F-H-[RH]-[LIVMN]-[LIVMl-x(2)-F- [LIVM]-x-Q-[AG]-G- FKBPs. a 
45 family of proteins that bind the imrmmosuppressrve drug FK506, are also PPIases. but their sequence is not at all 
related to that of cyclophilin. 

[ 1] Stamnes M.A., Rutherford S.L, Zuker C.S. Trends Cell Biol, 2:272-276(1992). 
[ 2] Fischer G., Schmki F.X Btochemistry 29:2205-2212(1990). 
so [ 3] Trandinh C.C., Pao G.M., Saier M.H. Jr. FASEB J. 6:3410-3420(1 992). 

[ 4] Galat A. Eur J. Biochem. 216:689-707(1993). 
[ 5] Hacker J., Fischer G. Mol. Mtorobtol, 10:445456(1993). 

[ 6] Wu J., Matunis M.J.. Kraemer D., Btobel G.. Coutavas E. J. Btol. Chem. 270:14209-14213(1995) . 

ss [1267] 467, Profilin signature 

Profilin [1.2] is a small eukaryotk; protein that binds to monomeric actin(G-actln) in a 1:1 ratto thus preventing the 
pofymerizatkxi of actin into filaments (F-actin). It can also, in certain circumstance promotes actin polymerization. 
Profilin also binds to polyphosphoinositkles such as PIP2.0verall sequence similarity among profilin from organisms 
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which belong to different phyla (ranging from fungi to mammals) is low, but the N-terminal region is relatively well 
consented. That region is thought to be involved Inthe binding to actin. The signature pattem for profilin Is based on 
consenred residues at the N-terminal extremity .A protein structurally similar to profilin is present in the genome of 
variola arxJ vaccinia viruses (gene A42R). 
s [1 268] Consensus pattem: <x(0, 1 HSTA)-x(0, 1 )-W4DENQH]-x-[YI]-x-{DEQ] 

[ 1] Haarer B.K., Brown S.S. Cell Motil. Cytoskeleton 17:71-74(1990). 
[ 2] Sohn RH.. Goldschmidt-Clermont R BioEssays 16:465-472(1994). 

10 [1269] 468. Protamine PI signature 

Protamines are small, highly basic proteins, that substitute for histones in sperm chromatin during the haptoid phase 
of spermatogenesis. They pack sperm DNA into a highly condensed, stable and inactive complex. There are two 
different types of mammalian protamine, called PI and P2. PI has been found in all species studied, while P2 is 
sometimes absent. There seems to be a single type of avian protamine whose sequence is closely related to that of 

IS mammalian PI [1].Asasignaturefbrthisfamily of proteins, a conserved regfon was selected at the N-terminal extremity 
of the sequence. 

[1270] Consensus pattern: [AVl-I^NFY]-R-x(2,3)-(STl-x-S-x-S- 
[1271] [ 1] Oliva R., Goren R.. Dixon G.H. J. Bk)l. Chem. 264:17627-17630(1989). 
[1272] 469. Sperm histone P2 (protamine P2) 
20 This protein also known as protamine P2 can substitute for histones in the chromatin of sperm. The alignment contains 
both the sequence of the mature P2 protein and its propeptide. 
[1273] 470. Proteasome A-type subunits signature 

Theproteasome (ormacropain) (EC 3.4.99.46) [1 to5,El] is an eukaryotic and archaebacterial multicatalytic proteinase 
complex that seems to be Involved inan ATP/ubiqurtin-dependent nonlysosomal proteolytic pathway In eukaryotes the 

25 proteasome is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (20S ring) of 
about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups. A 
and B. Subunits that bekxig to the A-type group are proteins of from 210 to 290 amino acids that share a number of 
consented sequence regfons. Subunits that are known to bekmg to this family are listed below. - Vertebrate subunits 
C2 (nu), C3, C8. C9, iota and zeta. - Drosophila PROS-25, PROS-28. 1 , PROS-29 and PROS-35. - Yeast 01 (PRS1 ). 

30 C5 (PRS3), C7-alpha (Y8) (PRS2), Y7. Y1 3, PRE5, PRE6 and PUP2. - Arabldopsis thallana subunits alpha and PSM30. 
- Thenmoplasma acidophilum alpha-subunit. In this archaebacteria the proteasome is composed of only two different 
subunits.As a signature pattern for proteasome A-type subunits the best consented regton was selected, which Is 
kx^ated in the N-tenminal part of these proteins. 

[1274] Consensus pattem: ll^-x(4)-[STNV]-x4FYWJ-S-P-x-G-(RKH]-x(2)-Q-[LIVMh[DE]- Y-[SAD]-x(2)-[SAGl-. 
35 These proteins belong to family T1 in the classificatksn of peptkiases [6,E2]. 

[ 1] Rivett A.J. Biochem. J. 291:1-10(1993). 
[ 2] Rivett A.J. Arch. Bkx^hem. Bk>phys. 268:1-8(1989). 
[ 3] Gokiberg A.L.. Rock K.L Nature 357:375-379(1992). 
40 1 4] Wilk S. Enzyme Protein 47:187-188(1993). 

[ 51 Hill W.. Wolf D.H. Trends Bkx:hem. Sci. 21:96-102(1996). 
[ 6] IRawlings N.D., Barrett A.J. Meth. Enzymol. 244:19^1(1994). 

[1275] Proteasome B-type subunits signature 

45 The proteasome (or macropain) (EC 3.4.99.46) [1 to 5,E1] is an eukaryotic and archaebacterial multicatafytic proteinase 
complex that seems to be involved in an ATP/ubiquitin-dependent nonlysosomal proteolytic pathway. In eukaryotes 
the proteasome is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (20S ring) 
of about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, 
A and B. Subunits that belong to the B4ype group are prrteins of from 1 90 to 290 amino ackJs that share a number of 

so consented sequence regkxis. Subunits that are known to betong to this family are listed below. - Vertebrate subunits 
05, beta, delta, epslton, theta (C10-11), LMP2/RING12, C13 (LMP7/RING10), C7-I and MECL-1. - Yeast PRE1. PRE2 
(PRG1), PRE3, PRE4, PRS3. PUP1 and PUP3. - Drosophila L(3)73AI. - Fission yeast ptsl. - Themioplasma ackJo^ 
philum beta-subunit. In this archaebacteria the proteasome is composed of only two different subunits. As a signature 
pattem for proteasome B-type subunits the best conserved region was selected, which Is located In the N^erminal part 

ss of these proteins. 

[1276] Consensus pattem: [LIVMAl-lGSA]-ILIVMF]-x-[FYLVGACJ-x(2)-[GSACFY]-[LIVI^STAC](3)-[GAC]- 
[GSTACVHDES]-x(15)-(RK]-x(12,13)-G-x(2)-IGSTAJ-D-, These proteins betong to family T1 in the classification of 
peptidases f6.E2l 
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[ 1] Rivett A.J. Biochem. J. 291:1-10(1993). 
[ 2] Rivett A.J. Arch. Biochem. Biophys. 268:1^8(1989). 
1 3] Goldberg A.L., Rock K.L Nature 357:375-379(1992). 
[ 4] Wilk S. Enzyme Protein 47:187-188(1993). 
5 [ 5] Hitt W, Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 

[ 6] Rawlings N.D.. Barrett A. J. Meth. Enzymol. 244:19-61(1994). 

[12771 471. (pyr redox) Pyridine nucleotkJe-disulphlde oxidoreductases class-l active site 

The pyridine nucleotWe-disulphide oxkJoreductases are FAD flavoproteins whch contains a pair of redox-active 
10 cysteines involved in the transfer of reducing equivalents from the FAD cofactor to the substrate. On the basis of 
sequence and structural similarities [1] these enzymes can be classified into two categories. The first category groups 
together the following enzymes [2 to 6]: - Glutathkxie reductase (EC 1.6.4.2 ) (GR). - Higher eukaryotes thbredoxin 
reductase (EC 1.6.4.5) . - Trypanothbne reductase (EC 1 ■6.4.8) . - Lipoamide dehydrogenase (EC 1.8.1.4 ). the E3 
component of atpha-ketoacM dehydrogenase complexes. - Mercuric reductase (EC 1.l6.l.l ).The sequence around 
IS the two cysteines involved in the redox-active disulfide bond is consen/ed and can be used as a signature pattem. 
[12781 Consensus pattem: G-G-x-C4LIVA]-x(2)-G-C-[LIVM]-P [The two C*s fonn the active site disulfide bond). In 
positkxis 6 and 7 of the pattem all known sequences have Asn^Nfel/ He) with the exception of GR from plant chtoroplasts 
and from cyanobacteria whkii have lle-Arg [7]. 

20 1 1] Kurlyan J.. Krishna TS.R., Wong L, Guenther B.. Pahler A., Williams C.H. Jr., Model R Nature 352:172-174 
(1991). 

[ 2] Rice D.W., Schuiz G.E.. Guest J.R. J. Moi. Biol, 174:483-496(1984). 
[ 3] Brown N.L Trends Bk»hem. Sci. 10:400-402(1985). 

( 4] Carothers D.J., Pons G., Patel M.S. Arch. Biochem. Bbphys. 268:409-425(1989). 
2S [ 5) Walsh C.T.. Bradley M., Nadeau K. Trends Biochem. Sci. 16:305-309(1991). 

[ 6] Gasdaska RY„ Gasdaska J.R.. Cochran S.. Powis G. FEBS Lett. 373:5-9(1995). 

[ 7] Creissen G., Edwards E.A. Enard C. Wellbum A., Mullineaux R Plant J. 2:129-131(1991). 

[1279] 472. (pyridoxal deC) DDC / GAD / HDC / TyrDC pyridoxal-phosphate attachment site (pyrkloxal deC) 
30 Three different enzymes - all pyridoxal-dependent decartjoxylases - seem to share regions of sequence similarity 
[1 ,2.3,4], especially in the vicinity of the lysine residue which sewes as the attachment site for the pyridoxal-phosphate 
(PLP) group. These enzymes are: - Glutamate decarboxylase (EC 4.1.1.15) (GAD). Catalyzes the decarboxylation of 
glutamate into the neurotransrnitter GABA (4-amlnobutanoate). - Histidine decarboxylase (EC 4.1.1.22 ) (HDC). Cata- 
lyzes the decarl)oxylatkx) of histidine to histamine. There are two completely unrelated types of HDC: those that use 
3S PLP as a cofactor (found In GrarrKiegative bacteria and mammals), and those that contain a covalently bound pyruvoyi 
residue (found in Gram-positive bacteria). - Aromatk:-L-amino-acid decarboxylase (EC 4.1.1.28) (DDC). also known 
as L-dopa decarboxylase or tryptophan decarboxylase. DDC catalyzes the decarboxylatran of tryptophan to tryptamine. 
It also acts on 5-hydroxy-tryptophan and dihydroxyphenylalanine (L-dopa). - Tyrosine decarboxylase (EC 4.1.1.25 ) 
(TyrDC) whk:h converts tyrosine intotyramine. a precursor of isoquirK>line alkalosis and various amkies.These enzymes 
40 are collectively known as group II decarixscylases [3,4]. 

[12801 Consensus pattem: S4UVMPmi-x(5)-K-ILIVMFYWG](2)-x(3)-[LiVMFYW]-x-[CA]-x(2)-[LIV^ 
[RK] [K is the pyridoxal-P attachment site] 

[ 1] Jackson RR. J. Mol. Evol. 31:325-329(1990). 
45 [ 2] Joseph D R., Sullivan R, V\fang Y.-M., Kozak C, Fenstermacher D.A.. Behrendsen M.E„ Zahnow C.A Proc. 

Natl. Acad. Sci. U.S.A. 87:733-737(1990). 

[ 3] Sandmeier E., Hale Tl., Christen R Eur. J. Biochem. 221:997-1002(1994). 

[ 4] Ishii S., Mizugk:hi H., Nishino J., Hayashi H.. Kagamlyama H. J. Bkxjhem. 120:369-376(1996). 

so [1 2811 473. Regulator of chromosome condensation (RCC1 ) signatures (RCCl ) 

The regulator oi chromosome condensatk)n (RCCl) [1] is a eukaryotic prc^ein which binds to chromatin and interacts 
with ran, a nuclear GTP-binding protein, to promote the loss of bound GDP and the uptake offresh GTP, thus acting 
as a guanine-nucleotide dissociatk)n stimulator (GDS)[2]. The interacton of RCCl with ran probably plays an important 
role in the regulatkxi of gene expressnn. RCCl. known as PRP20 or SRMl in yeast, pimi in fissfon yeast and BJ1 in 

ss Drosophila. is a protein that contains seven tandem repeats of a domain of about 50 to 60 amino acids. As shown in 
the following schematic representatkjn, the repeats make up the major part of the length of the protein. Outside the 
repeat regkxi, there is just a small N-terminal domain of about 40 to 50 resklues and. in the Drosophila protein only, a 
C-terminal domain of about 130 residues, h — i 1 1 1 h 1 h 1 ^ IN-t.lRpt. 
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1 IRpt. 2 IRpt. 3 IRpt. 4 IRpt. 5 IRpt 6 IRpt. 7 I C-terminal I h 1 h -i h 1 -i ^ 

H h In Drosophila two signature patterns for RCC1 were developed. The first is found in the N-terminal part 

of the second repeat; this is the most consented part of RCC1 . The second is derived from consented positions in the 
C^erminal part of each repeat and detects up to five copies of the repeated domain. The RCC1 -type of repeat is also 
s found in the X-llnked retinitis pigmentosa GTPase regulator [3]. 
[1282] Consensus pattern: G-x-N-D-x(2HAVl-L-G-R-x-T- 
Consensus pattem: (LIVMFAHSTAGC](2)-G-x(2)-H4STAGUHLIVMFA)-x-(LIVM]- 

[ 1] Dasso M. Trends Biochem. Sci. 18:96-101(1993). 
10 1 2] Boguski M.S., McCormick F. Nature 366:643-654(1 993). 

[ 3] Roepman R., Van Duijnhoven G., Rosenberg T. Pinckers A.J.L.G.. Bleeker-Wagemakers LM., Bergen AA 
B.. Post J., Beck A.. Reinhardt R. Ropers H.-H.. Cremers F.. Berger W. Hum. Mol. Genet. 5:1035-1041(1996). 

[1283] 474. RNA S'-terminal phosphate cyclase signature (RCT) 

« RNA 3-terminal phosphate cyclase (EC 6.5.1.4 ) [1 ,2] catalyzes the conversbn of S'-phosphate to a 2'.3'-cyclic phos- 
phodiester at the end of RNA. The bblogical role of this enzyme is unknown but it is likely to function in some aspects 
of cellular RNA processing. The reactbn catalyzed by the enzyme occurs in three steps: 1 ) adenylatbn of the enzyme 
by ATP; 2) the enzyme acts on RNA-31erminal phosphate to produce RNA-3nerminal diphosphate adenylate; 3) Re- 
lease of AMP and cyclisatkm by a non catalytic nucleophilic attack by the adjacent 2'hydroxyl on the phosphorus in 

20 the diester linkage. This enzyme, which has been characterized in human (where there seems to be at least three 
isozymes) and Escherk:hia coli (gene rtCA). seems to be taxonomk:aliy widespread. It is found in insects, plants, fungi 
(gene RTC1 Inyeast) and in archeabacteria. RNA cyclase is a protein of from 36 to 42 Kd. The best consented regton. 
whk:h Is used as a signature pattem, is a glycine-rich stretch of residues located in the central part of the sequence 
and whfch is reminiscent of varkMJS ATP, GTPor AMP glycine-rich kx)ps. In this context, the consented Arg (His in the 

2S E.coli enzyme) couW be the AMP-binding reskiue. 

[1284] Consensus pattem: (RH]-G-x(2)-P-x-G(3)-x-(LIVh 

[ 1] Genschik P., Billy E., Swianiewkx M., Filipowfcz W. EMBO J. 16:2955-2967(1997). 
[ 2] Filipowicz W., Vincente O. Meth. Enzymol. 181:499-510(1990). 

30 

[1285] 475. REV protein (anti-represskxi trans-acllvator protein) 

[1288] 476. Prokaryotk:-type class I peptWe chain release factors signature (RF-1 ) 

Peptide chain release factors (RFs) are required for the termination of protein biosynthesis [1]. At present two classes 
of RFs can be distinguished. Class I RFs bind to ribosomes that have encountered a stop codon at their decoding site 

3S and induce release of the nascent polypeptide. Class II RFs are GTP-binding proteins that Interact with class I RFs 
and enhance class I RF activity. In prokaryotes there are two class I RFs that act in a codon specific mannerl2]: RF-1 
(gene prf A) mediates UAA and UAG<iependent termination white RF-2(gene prfB) mediates UAAand UGA-dependent 
termination. RF-1 and RF-2 are structurally and evolutkxiary related proteins whfch have been shown [3] to make up 
a family that also contains the foltowing proteins: - Fungal MRF1. a mitochondrial RF (m-RF) which recognizes the 

^ UAA and UAG codons. - Escherichia coli RF-H, a protein of unknown functton. - Escherichia coli hypothetical protein 
yaeJ and a ctose Pseudomonas putkia homolog. A highly conserved region located in the central part of the 40 to 45 
Kd RF-1/2 and nr>-RF and in the N-tenminal of the 15 to 16Kd RF-H and yaeJ is used as a signature pattem. 
[1287] Consensus pattem: tARl-[STA]-x-G-x-G-6-CHHN6CS}-V-N-x(3)-[ST|-A-[l V] 

Note that prokaryotc-type class I RFsdisplay no significant sequence similarity to prokaryotlc-type class II whkrhbebng 
^ to the family of GTP-binding ekmgatbn factors nor to eukaryotk: class I or class II RFs. 

[ 1] Tate W.R, Poole E.S., Mannering S.M. Prog. Nucleic Ackis. Res. Mol. Bk)l. 52:293-335(1996). 
[ 2] Craigen W.J., Lee C.C., Caskey C.T Mol. Microbbl. 4:861-865(1990). 
( 3] Pel H.J., Rep M., Grivell LA. Nuciek: Acids Res. 20:4423-4428(1992). 

so 

[1288] 477. RIO1/ZK632.3/MJ0444 family signature 

The folfcwing uncharacterized proteins are evolutbnary related [1]: - Yeast protein RI01 . - Caenorhabditis elegans 
hypothetkal protein ZK632.3. - Methanococcus jannaschii hypothetical protein MJ0444. - ThemrK>plasma acidophilum 
hypothetcal protein if rpoA2 3'regk)n.The eukaryotic members of this family are proteins of about 55 to 60 Kd, while 
ss the archebacterial ones are half that size. The central part of these proteins is highly conserved. The best consented 
region is used as a signature pattem. 

[1289] Consensus pattem: [LIVM]-V-H-{GA]-D-L-S-E-{FY]-N-x-{LIVM) 
[1290] [ 1] Bairoch A. Unpublished obsen/atbns (1 997). 
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[1291] 478. (RIP)Shiga/ricinrbosomal inactivating toxins active site signature. A number of bacterial and plant toxins 
act by inhibiting prc^ein synthesis in eukaryotic cells. The toxins of the Shiga and ricin family inactivate 60S ribosomat 
subunits by an N-gtycosidic cleavage which releases a specific adenine base from the sugar-phosphate backbone of 
288 rRNA [1 ,2.3J. The toxins which are known to f unctkxi in this manner are: - Shiga toxin from Shigella dysenteriae 

s [4]. This toxin is composed of one copy of an enzymatlcally active A subunit and five copies of a B subunit responsible 
for binding the toxin complex to specifk: receptors on the target cell surface. - Shiga-like toxins (SLT) are a group of 
Escherichia coli toxins veiy similar in their stnicture and properties to Shiga toxin. The sequence of two types of these 
toxins, SLT-1 (5) and SLT-2 [S\ is known. - Rk:in. a potent toxin from castor bean seeds. Ricin consists of two glyco- 
sylated chains linked by a disulfide bond. The A cfiain is enzymatically active. The B chain is a lectin with a binding 

10 preference for galactosides. Both chains are encoded by a single polypeptkJic precursor Ricin is classified as a type- 
II ribosome-inactivating protein (RIP); other members of this family are agglutinin, also from castor bean, and abrin 
from the seeds of the bean Abrus precatorius [7]. - Single chain ribosome-inactivating proteins (type-l RIP) from plants. 
Examples of such prc^eins are: barley protein synthesis inhibitors I and II, mongolian snake-gourd trichosanthin, sponge 
gourd luffin-A and -B, garden four-o'ckx:k MAP, common pokeberry PAP S and soapwort saporin-6 [7]. All these toxins 

IS are structurally related. A consented glutamic residue has been implicated [8] in the catalytic mechanism; it Is located 
near a consented arginine whk:h also plays a role in catalysis [9]. The signature that has been devetoped for these 
proteins includes these catalytk: rescues. 

[1292] Consensus pattem: [LIVMAl.x-{LIVMSTA](2)-x-E-[SAGVHSTAL]-R-[FYHRKNQS]-x- [LIVM].[EQS]-x(2)- 
[Ll VMF] [E and R are active site reskJues]- 

20 [1293] [ 1] Endo Y, Tsurugi K., Takeda Y, OgasawaraT. Igarashi K. Eur. J. Biochem. 171:45-50(1988).[2] May M. 
J., Hartley M R., f=toberts LM., Krieg P.A., Osbom R.W., Lord J.M. EMBO J. 8:301 -308(1 989).[ 3] Funatsu G., Islam 
M.R, Minami Y. Sung-Sil K., Kimura M. Bk)chlmle 73:1157-1161(1 991 ).[ 4] Strockbine N.A., Jackson M.P, Sung L 
M.. Holmes R.K., QBrien A.D. J. Baclerkrf. 170: 1116-11 22(1 988).( 5] CakJenwood S B.. Auclair R, Donohue-Rolfe A., 
Keusch G.T.. Mekalanos J.J. Proc. Natl. Acad. Sci. U.S.A. 84:4364-4368(1 987).[ 6] Jackson M.R. Neill R.J., CBrleri 

2S A.D., Holmes R.K., Newland J.W. REMS Mk:robk>l. Lett. 44:109-11 4(1 987). [ 7] Barbieri L. Battelli M.G., Stirpe R Bb- 
chim. Blophys. Acta 11 54:237-282(1 993). [ 8] Hovde C.J.. CaWerwood S.B., Mekalanos J.J.. Collier R.J. Proc. Natl. 
Acad. Sci. U.S.A. 85:2568-2572(1 988).[ 9J Monzlngo A.R. Collins E.J., Ernst S.R.. In/in J.D.. Robertus J.D. J. MoL 
Btol. 233:705-715(1993). 

[1294] 479. Bacterial RNA polymerase, alpha chain (RNA pol A bac) 

30 Members of this family Include alpha subunit from eubacteria and alpha subunits from chloroplasts. The alpha subunit 
of RNA polymerase consists of two independently foWed domains, referred to as amino4en7iinal and carboxyl terminal 
domains. The amino terminal domain is involved in the interactton with the other subunits of the RNA polymerase. The 
carboxyl-terminal domain interacts with the DNA and activators. The amino acid sequence of the alpha subunit is 
consented in prokaryotic and chtoroplast RNA polymerases. There are three regrans of partfcularly strong conservatkxi. 

35 two in the amino-terminal and one in the cartxjxyl-Comment: terminal [3]. 

[1 ] Zhang G, Darst SA; Science 1 998;281 :262-266. [2] Jeon YH. Negishi T, Shirakawa M, Yamazaki T. Fujita N. Ishihama 
A. Kyogoku Y; Science 1995;270:1495-1497. [3] Ebrlght RH. Busby S; CurrOpin Genet Dev 1995;5:197-203. [4] Mu- 
rakami K. Kimura M, Owens JT. Meares CF. Ishihama A; Proc Natl Acad Sci USA 1997;94:1709-1714. 
[1295] 480. RNA polymerase beta subunit (RNA pol B) 

40 RNA polymerases catalyse the DNA dependent polymerisatkxi of RNA. Prokaryotes contain a single RNA polymerase 
compared to three in eukaryotes (not including mitochondrial and chtoroplast polymerases). Each RNA polymerase 
complex contains two related members of this family, in each case they are the two largest subunits. [1] Falkenburg 
D. Dwomk:zak B. Raust DM. Bautz EK; J Mol Biol 1987;195:929-937. 
[1296] 481. RNA polymerases H / 23 Kd subunits signature 

45 In eukaryotes, there are three different fonms of DNA-dependent RNA polymerases (EC 2.7.7.6 ) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria. there is generally a single form of RNA polymerase which also consist of an ollgomeric assemblage of 10 to 13 
polypeptWes. Archaebacterial subunit H (gene rpoH) [1.2] is a small protein of about 8.5 to10 Kd, it is evolutionary 
related to the C-tenninal part ol a 23 Kd component shared by all three forms of eukaryotk; RNA polymerases (gene 

so RPB5 in yeast and POLR2E in mammals). As a signature pattem a conserved regton was selected which Is located at 
theN-temiinal extremity of subunit H; this regton contains two histidines that could play a role in the binding of a metal ion. 
[1297] Consensus pattem: H-[NElHUVMl-V-P-x-H-x(2)-[UVM]-x(2)-[DE] 

1 1] Klenk H.-P.. Palm R. Lotlspek:h R, Zilllg W. Proc. Natl. Acad. Sci. U.S.A. 89:407-410(1992). 
ss [ 2]Thiru A., Hodach M.. Etoranta J.J.. Kostourou V. Weinzierl R.O., Matthews S.: J. Mol. Biol. 287:753-760n 999). 

[1298] 482. RNA polymerases K / 14 to 18 Kd subunits signature 

In eukaryotes, there are three different fonms of DNA-dependent RNApolymerases (EC 2.7.7.6 ) transcribing different 
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sets of genes. Each class of RNA polymerase is an assemblage off ten to twelve different polypeptides. In archaebac- 
terla. there is generally a single fomi of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. A component of 14 to 18 Kd shared by all three forms of eukaryotic RNA polymerases and which has 
been sequenced In budding yeast (gene RPB6 orRP026), In fission yeast (gene rpb6 or rpol 5), in human and in African 
s swine fever virus [1] is evolutionary related [2] to archaebacterial subunit K (gene rpoK). The archaebacterial protein 
is colinear with the C-terminat part of the eukaryotic subunit. 
[12991 Consensus pattern: [STl-x-[FY]-E-x-[AT]-R-x4UVMHGSA]-x-R-[SA]-x-Q 

[ 1] Lu Z., Kutish G.F.. Sussman M.D.. Rock D.L Nucleic Ackte Res. 21:2940-2940(1993). 
10 1 2] McKune K., Woychik N.A. J. Bacteriol. 176:4754-4756(1994). 

[1300] 483. RNA polymerases L/ 13 to 16 Kdsubunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNApolymerases (EC 2.7.7.6) transcrtoing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptkJes. In archaebac- 

is teria, there is generally a single fonm of RNA polymerase whch also consist of an oligomeric assemblage of 10 to 13 
polypeptides. It has been shown that small subunits of about 1 3 to 1 6 Kd found in all three types of eukaryotic polymer- 
ases are highly conserved. Subunits known to bebng to this ^ily are: - Budding yeast RPC19 subunit from RNA 
polymerases I and III [1]. - Budding yeast RPB11 subunit from RNA polymerase II [2]. - Mammalian RPB11 (gene 
POLR2K) from RNA polymerase II. - Caenorhabditis elegans hypothetical protein F58A4.9. - Methanococcus jannaschii 

20 RNA polymerase subunit L (gene rpoL). - Sulfotobus acidocaldarius RNA polymerase subunit L (gene rpoL) [3].As a 
signature pattern a consented regkxi was selected which is kxated at the N-terminal extremity of these polymerase 
subunits; this region contains two cysteines that coM play a role In the binding of a metal \on, 
[1301] Consensus paltem: [DE](2)-H-IST|-(UVMHGAP]-N-x(11)-V-x-[FM]-x(2)-Y-x(3)- H-P 

25 [ 1] Dequard-Chablat M., Riva M., Carles C, Sentenac A. J. Bbl. Chem. 266:15300-15307(1991). 

[ 2] Woychik N.A.. McKune K., Lane W.S., Young R.A. Gene Expr. 3:77^82(1993). 
[ 3] Langer D. EMBL/GenBank: X70805. 

[1 302] 484. RNA polymerases N / 8 Kd subunits signature 

30 In eukaryotes, there are three different fomns of DNA-dependent RNA polymerases (EC 2.7.7.6 ) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptktes. In archaebac- 
teria, there is generally a single form of RNA polymerase whk:h also consist of an oligomeric assemblage of 10 to 13 
polypeptttles. Archaebacterial subunit N (gene rpoN) [1] is a small protein of about 8 Kd, it is evolutionary related [2] 
to a 8.3 Kd component shared lay all three forms of eukaryotic RNA polymerases (gene RPB10 In yeast and POLR2J 

35 in mammals) as well as to Af rfcan swine fever virus protein CP80R [3]. As a signature pattem a conserved region was 
selected which is bcated at the N-termlnal extremity of these polymerase subunits; this regkxi contains two cysteines 
that could play a role in the binding of a metal nn. 
[1303] Consensus pattem: IUVMF](2)-P-[LIVMJ-x-C-F4ST]-C-G- 

40 [ 1] Langer D., Main J.. Thuriaux P, Zillig W. Proc. Natl. Acad. Sci. U.S.A. 92:5768-5772(1995). 

[ 2] McKune K., Woychik N.A. J. Bacterbl. 176:4754-4756(1994). 

[ 3] Yanez R. J., Rodriguez J.M., Nogal M.L. Yusle L, Enrk^uez C. Rodriguez J.F . Vinuela E. Virotogy 208:249-278 
(1995). 

45 [1304] 485. RIbonuclease HII 

[1] Mian IS; Nuciek: Ackte Res 1997;25:3187-3189. 
[1305] 486. RIbonuclease PH signature 

Prokaryotc ribonuclease PH (EC 2.7.7.56 ) (RNase PH) [1 ] is a phosphorolytteexoribonuclease that removes nucleotkie 
residues following the -CCA terminus of tRNA and adds nucleotkJes to the ends of RNA molecules by using nucleoside 
so diphosphates as substrates. RNase PH is a consen/ed protein of about 240 amino-acid reskiues. It is evolutionary 
related to Caenorhabditis elegans hypothetk»l protein B0564.1.As a signature pattern, the nrx>st highly consented 
region was selected whk:h is kxated in the central part of these proteins. 

Consensus sequence: C-[DEHLIVMI(2).CHGTAI-D-G4SG]-x(2)-[TA]-A [ 1] Kelly K.O.. Deutscher M.P J. Biol. Chem. 
267:17153-17158(1992). 
55 [I30q 487. RanBPI domain 

[1] Dl Matteo G. Fuschi P. Zerfass K, Moretti S, Rk»rdy R, Cenciarelli C. Tripodi M, Jansen-Durr P, Lavia P; Cell Growth 

Differ 1995;6:1213-1224. 

[1 307] 488. Rhodanese signatures 
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Rhodanese (thiosulfate sulf urtransferase) (EC 2.8.1. 1^ [1 ,2] is an enzyme which catalyzes the transfer of the sulfane 
atom of throsutfate to cyanide, to form sulfite and thiocyanate. In vertebrates, rhodanese is a mitochondrial enzyme of 
about 300 amino-ackJ residues involved in forming iron-sulfur complexes and cyanide detoxification. A cysteine residue 
takes part in the catalytic mechanism. Some bacterial proteins closely related to rhodanese are also thought to express 

s a sulfotransferase activity. These are: - Azotobacter vinelandii rhdA. - Escherichia coli sseA [3]. - Saccharopolyspora 
erythraea cysA [4]. - Synechococcus strain PCC 7942 rhdA [5]. RhdA is a periplasmic protein probably involved in the 
transport of sulfur compounds. Two patterns for the rhodanese family were devebped. They are based on highly 
consen/ed regions, one which is located in the N-tcrminal region, the other at the C-terminal extremity of the enzyme. 
[1 308] Consensus pattern: [FYJ-x{3)-H-[LI V]-P-G-A-x(2HLI VF] 

10 Consensus pattem: IFYHDEAF]-G4SAJ-W-x-E4FYW| 

[ 1] Westley J. Meth. Enzymol. 77:285-291(1981). 
[ 2] Weiland K.L. Dooley TP Bkx:hem. J. 275:227-231(1991). 
[ 3] Rudd K.E. Unpublished observations (1993). 
IS [ 4] Donadio S., Shafiee A.. Hutchinson C.R. J. Bacteriol. 172:350-360(1990). 

[ 5] Laudenbach D.E., Ehrhardt D.. Green L, Grossman A.R. J. Bacteriol. 173:2751-2760(1991). 

[1309] 489. Ribonuclease III family signature 

Prokaryotic ribonuclease III (EC 3.1.26.3) (gene mc) [1] is an enzyme that digests double-stranded RNA. It is involved 
20 in the processing of ribosomal RNA precursors and of some mRNAs. RNase III is evolutionary related [2] to the following 
proteins: - Fissbn yea^ pad, a ribonuclease that probably inhibits mating and meiosis by degrading a specific mRNA 
required for sexual development - Yeast ribonuclease III (gene RNT1). a dsRNA-specific nuclease that cleaves eu- 
karyotic preribosomal RNA at varbus sites. - Caenorhabditis elegans hypothetical protein F26E4.13. - Paramecium 
bursaria chlorella virus 1 protein A464R. - Synechocystis strain PCC 6803 hypothetbal protein slr0346. - Fission yeast 
2S hypothetk:al protein SpAC8A4.08c, a protein with a N-terminal helicase domain and a C-terminal RNase III domain. - 
Caenorhabditis elegans hypothetical protein K12H4.8. a protein with the same structure as SpAC8A4.08c.These pro- 
teins share regions of sequence similarity; one of whbh is a highly consented stretch of 9 residues which has been 
developed as a signature pattem. 

[1310] Consensus pattem: [DEQ]-[RQHLM]-E-{FYW]-[LV]-G-D-(SAR]- 

30 

1 1] Nashimoto H., Uchida H. Mol. Gen. Genet. 201:25-29(1985). 
[ 2] Mian I.S. Nuclec Acids Res. 25:3187-3195(1997). 

[1 31 1] 490. Rieske iron-sulfur protein signatures 

3S Ubk|uinol-cytochrome c reductase (EC 1.10.2.2 ) (also known as the bcl complexor complex III) is one of the electron 
transport chains <^ mitochondria and of some aerobe prokaryotes; it catalyzes the oxidoreductbn of ubk^uinol and 
cytochronrw c. In the chloroplast of plants and in cyanobacteria plastoquinone-plastocyanin reductase (EC 1.10.99.1^ 
(also known as the b6f complex) is functkxially similar and catalyzes the oxkloreduction of plastoquinol and cytochrome 
f . One of the components of these electron transfer systems is an iron-sulfur protein with a 2Fe-2S cluster, which is 

40 called the Rieske protein [1,2]. The Rieske protein contains approximately 190 amino ackJ residues. The iron-sulfur 
cluster is complexed to the protein through cysteine and histkiine reskiues. Two perfectly conserved regbns in Rieske 
proteins contains all the resbuesthat bind the iron-sulfur cluster. Both regbns contain two cysteines and a histidine. 
The first cysteine and the histbine are 2Fe-2S ligands whib the remaining cysteines form a disulfide bond [3]. Two 
consented regbns were selected as signature patterns. 

45 [1312] Consensus pattem: C-[TK]-H-L-G-C-[LIVST| [The first C and the H are 2Fe-2S ligands] [The second C is 
involved in a disulfide bond] 

Consensus pattem: C-P-C-H-x-[GSA] (The first C and the H are 2Fe-2S ligands] [The second C is involved in a disulfbe 
bond] 

so [1] Gatti FL, Meinhardt S.W., Ohnishi T, Tzagoloff A. J. Mol. Bbl. 205:421-435(1989). 

[ 2] Kallas T. Spiller S.. Malkin R. Proc. Natl. Acad. Sci. U.S.A. 85:5794-5798(1988). 
[ 3] Iwata S.. Saynovits M., LinkTA., Mbhel K Structure 4:567-579(1996). 

[1 31 3] 491 . Ribosonnal protein LI signature 
ss Ribosomal prc^ein LI is the larger protein from the large ribosomal subunit.ln Escherbhia coli, L1 is known to bind to 
the 23S rRNA It bebngs to a family erf rtoosomal proteins which, on the basis of sequence similarities [1 , 2], groups: 
- Eubacterial LI. - Algal and plant chtoroplast LI. - Cyanelle LI. - Archaebacterial LI. - VBrtebrate LI OA. - Yeast 
SSM1 .As a signature pattern, the best conserved region was setected bcated in the central sectbn of these proteins. 
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tt is located at the end of an alpha helix thought to be involved in RNA-binding. 

[1314] Consensus pattern: pMhx(2)4LIVA]-x(2.3HUVM]-G-x(2)^LMSHGSNHHPTKR>[KRAV]^-x4UMF]-P. 
[DENSTKQ] 

s [1] Nikonov S.V., Nevskaya N., Eliseikina LA., Fomenkova N.R, Nikulin A., Ossina N., Garter M., Jonsson B.-H.. 

Briand C, Al-Karadaghi S.. Svensson LA., Aevarsson A. Liljas A. EMBO J. 15:1350-1359(1996). 
[ 2) Olvera J.. Wool I.G. 2.3.CO:2-'Biochem. Biophys. Res. Conrtmun. 220:954-957(1996). 

[1315] 492. Ribosomal protein L10 signature 
10 Ribosomal protein L10 ^ one erf the proteins fronri the large ribosomal subunrt. L10 is a protein of 162 to 185 amino- 
ackj residues which has only been found so far in eubacteria. A consented region located in the N4erminal section of 
these proteins was used as a signature pattem. 

[13iq Consensus pattem: [DEH]-x(2)-{GS]-[LIVMF]4STN]4VA]-x4DEQK]4UVMA]-x(2)4 
[1317] 493. Ribosomal protein LlOe signature 

IS A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: - Vertebrate L10 (QM) [1]. - Plant L10. - Caenorhabditis elegans L10 (F10B5.1). - 
Yeast L10 (QSR1). - Methanococcus jannaschil l^0543.These proteins have 174 to 232 amino^cid residues. A con- 
sented region located in the central section was selected as a signature pattem. 
[1 31 8] Consensus pattem: R-x-A-[FYW]-G-K-[PAl-x-G-x(2)- A-R- V 

20 1 1] Chan Y-L, Diaz J.-J., Denorcy L, Madjar J. -J.. Wbol I.G. 2.acO:2-'Biochem. Bk)Phvs. Res. Commun. 255: 
952-956(1 996). 

[1319| 494. Ribosomal protein L11 signature 

[1320] Ribosomal protein L11 is one of the proteins from the large ribosomal subunlt. In Escherichia coli, L11 is 
known to bind directly to the 23S rRNA It belongs to a family of ribosomal proteins which, on the basis of sequence 
25 similarities [1 ,2], groups: 

- Eubacterlal L11. 

- Plant chtoroplast L11 (nuclear-encoded). 
Read algal chloroplast L1 1 . 

30 - CyanelleLII, 

Archaebacterial L1 1. 
Mammalian LI 2. 

- Plants LI 2. 

- Yeast L12 (YL15). 

35 

[1321] L1 1 is a protein of 140 to 165 amino-acid residues. A consented region located in the C-terminal sectbn of 
these proteins was selected as a signature pattem. In Escherk:hia coli. the C-terminal half of L11 has been shown [3] 
to be in an extended and kx>sely fokJed conformation and is likely to be buried within the ribosomal structure. 
[1322] C:onsensus pattem: [RKN]-x-[LI VM]-x-G-{SThx(2HSfslQ]-[LI VM]-G-x(2)-[LI VM]-x{0. 1 )-{DENG] 

40 

[ 1] Pucciarelli G., Remacha M., Ballesta J.RG.; Nucleic Acids Res. 18:4409-4416(1990). 
[ 2J Otaka E.. Hashimoto T, Mizuta K., Suzuki K.; Protein Seq. Data Anal. 5:301-313(1993). 
[ 3] Choli T. BkKhem. Int. 19: 132^-1 338(1 989). 

45 [1 323] 495. Ribosomal protein L7/L1 2 C-tenminal domain 

[1 324] (1 ] Leijonmarck M. Liljas A; J Mol Biol 1 987; 1 95:555-579. 
[1325] 496. Ribosomal protein LI 3 signature 

Ribosomal protein LI 3 is one of the proteins from the large ribosonral subunit. In Escherichia coli, LI 3 is known to be 
one of the early assembly proteins of the 50S ribosomal subunit. It bebngs to a family of ribosomal proteins whch. on 
50 the basis of sequence similarities [1], groups: - Eubacterlal L13. 

- Plant chtoroplast LI 3 (nuclear-encoded). - Red algal chtoroplast L1 3. 

- Archaebacterial L1 3. - Mammalian LI 3a (Tum PI 98). - Yeast Rp22 and Rp23. 

ss [1 326] L1 1 is a protein of 1 40 to 250 amino-ackJ resdues. As a signature pattern, a conserved regkxi was selected 
located in the C-terminal section of these proteins. 

[1327] Consensus pattern: IUVM]4KRN04Giq-M-[LIN^PS]-x(4,5)-[GS]4NQEKRA]-x(5)-[LIVIWO.x-[AIV]-[LF^^^ 
[GDN] 
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[1328] [ 1] Chan Y.-L, Olvera J.. Glueck A., Wbol I.G. J. Biol. Chem. 269:5589-5594(1994). 
[1 329] 497. Ribosomal protein L1 3e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities [1]. One of these 
families consists 6f: 

s 

- Vertebrate LI 3 (was previously known as Breast Basic Conserved protein 1 (BBCl)). - Drosophila LIS. - Plant 
LIS. - Yeast probable L13 (YM9375.11c). 

These proteins have 199 to 218 amino-acid residues. As a signature pattern, a stretch of about 16 residues in the first 
10 third of these proteins selected. 

- Consensus pattern: [KRJ-Y-x(2)-K-(LIVM]-fl-{STA]-G4KR]-G-F4ST|-L-x-E 

[1330] [ 1] Olvera J., Wool I.G. Biochem. Biophys. Res. Commun. 201:102-107(1994). 
IS [1 331] 498. Ribosomal prcrtein LI 4 signature 

Ribosomal protein LI 4 is one of the proteins from the large ribosomal subunit. In eubacteria. LI 4 is known to bind 
directly to the 23S rRNA. It bek)ngs to a family of ribosomal proteins whk:h, on the basis of sequence similarities [1], 
groups: - Eubacterial L14. - Algal and plant chbroplast L14. - Cyanelle L14. - Archaebacterlal L14. - Yeast L17A. - 
Mammalian L23. 

20 

- Caenorhabditis elegans L23 (B0336.10). - Higher eukaiyotes mitochondrial L14. 

- Yeast mitochondrial Yml38 (gene MRPL38). 

LI 4 is a protein of 1 1 9 to 1 37 amino-ackj residues. As a signature pattem, a consented region located In the C-termlnal 
half of these proteins was selected. 

- Consensus pattem: [GAHLIVE(3)-x(9.10)4DNS]-G-x(4)-[FY]-x(2)-[NT]-x(2)-V-[LIV] 

[13321 [1]OtakaE., Hashimoto T, Mizuta K., Suzuki K. ProtelnSeq. Data Anal. 5:301-313(1993). 
30 [1 333] 499. Ribosomal protein LI 5 signature 

Ribosomal protein LI 5 is one of the proteins from the large ribosomal subunit. In Escherichia coli. LI 5 is known to bind 
the 23S rRNA. It bek>ngs to a family of ribosomal proteins whfch. on the basis of sequence similarities [1], groups: - 
Eubacterial LI 5. - Plant chkxoplast LI 5 (nuclear-encoded). 

35 - Archaebacterlal LI 5. - Vertebrate L27a. - Tetrahymena thermophila L29. 

- Fungi L27a (L29. CRP-1 , CYH2). 

LIS is a protein of 144 to 154 amino-ackl reskiues. As a signature pattem, a conserved regkxi was selected in the C- 
terminal sectbn of these proteins. 

40 

- Consensus pattem: K-[LIVMJ(2)-IGASL]-x-{GT]-x-[LIVI^A]-x(2.5)-[UVM]-x-[LIVI^^^ 
A-x(3)-[LIVM]-x(3)-G 

[1334] [ 1] Otaka E.. Hashimoto T, Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301 -31 3(1 993), 
^ [1335] 500. Ribosomal protein L15e signature 

A number of eukaryotic and archaebacterlal ribosomal proteins can be grouped on the basis of sequence similarities 
[1]. One of these families consists of: 

- Mammalian L15. - Insect LIS. - Plant L15. - Yeast YL10 (L13) (RplSr). 
so - ThermoplasmaacidophilumLIS. 

These proteins have about 200 amino acid reskiues. As a signature pattem. a conserved regwn was selected located 
in the central section. 

ss - Consensus pattern: lDEl-(KR]-A-R-x-L-G-IFYhx-[SAPJ-x(2)-G-[LIVMFY](4)-R-x-R-[IVl-x-^^ 

[ 1] Zwickl P., Lupas A, Baumeister W. 

Bkx:hem. Biophys. Res. Commun. 209:664-688(1995). 
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[1 3361 501 . Ribosomal protein LI 7 signature 

Ribosomal protein LI 7 is one of the proteins from the large ribosomal subunit, LI 7 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities, groups: - Eubacterial L17. 

5 - Yeast mitochondrial YmLB (gene MRPL8). 

Eubacterial LI 7 is a protein of 1 20 to 1 30 amino^cid residues. Yeast YmLB is twice larger (238 residues), the sequence 
of its N-terminal half is colinear with that of eubacterial LI 7. As a signature pattern, a conserved region in the N-termlnal 
section was selected. 

10 

- Consensus pattern: l-x-[ST]-[GT]-x(2HKRJ-x-K-x(6)4DEJ-x-(LIMV]^LIVMT]-T^^ 
[1337] 502. Ribosomal protein LlSe signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
IS One of these families consists of: 

- Vertebrate LIS (known as L14 in Xenopus) [1]. - Plant LIB. 

- Yeast LIB (Rp2B). - Halobacterium marismortui H129. 
Sulfolobus acidocaldarius H129e. 

20 

These proteins have 115 to 187 amino-acid residues., A stretch of about 1 3 residues in the first third of these proteins 
has been selected as a signature pattern. 

- Consensus pattern: [KRE]-x-L-x(2HPS]-[KR]-x(2)-IRH]-[PSA]-x-{LI VMJ-[NSJ-[LI VM]-x-[RKHLI VM] 

2S 

1 1] Puder M., Bamard G.R, Staniunas R.J., Steele G.D. Jr., Chen L.B. 
Biochlm. Blophys. Acta 1216:134-136(1993). 
[1338] 503. Ribosomal LIBp family 

It has been shown that the amino terminal 93 amino acids of Swiss:P09895 are necessary and sufficient to bind 5S 
so rRN A In vitro. The carbaxyMenminal half of the protein, comprising amino acids 1 51 -296, sen/es to localize the protein 
to the nucleolus [1]. 
Number of members: 26 
[1] 

Medline: 96212235 

3S Distinct donriains in ribosomal protein L5 mediate 5 S rRNA binding and nucleolar localization. 
Michael WM, Dreyfuss G; 
J Bbl Chem 1996;271:11571-11574. 
[1 339] 504. Ribosomal protein LI 9 signature 

Ribosomal protein L19 is one of the proteins from the large ribosomal subunit. In Escherichia coli, LI 9 is known to be 
40 UxaXed at the 30S-50S ribosomal subunit Interface and may play a role In the structure and function of the amlnoacyl- 
tRNA binding site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups: - 
Eubacterial LI 9. 

- Red algal chk>roplast LI 9. - Cyanelle LI 9. 

45 

LI 9 is a protein of 120 to 130 amino-acid residues., 

A consented regkxi in the C-terminal sectkm has been selected as a signature pattern. 

- Consensus pattern: [LIVMl-x4KRGTIJ-x-[GSAIHKRQDAHVGI-[RSN]-X(0,1 )-[KR]-[SA]-[KY]-[KLI]-[LYS]-Y-[LIMl- 
50 R 

[1340] 505. Ribosomal prc^ein L19e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

55 

- Mammalian ribosomal protein L19 [1]. - Drosophlla ribosomal protein LI 9 [2]. 
Slime mold (D. discoideum) vegetative specific protein VI 4 [3]. 

- Yeast ribosomal protein LI 9 (YL14). - Archebacterial ribosomal protein LI 9E. 
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[1 341] These proteins have 1 48 to 203 aminonacid residues. 

A stretch of about 20 residues in the N-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: CHKRhR4LIVM]-x-[SA]-x(4)-ICVJ-G-x(3HIV]-[WK]-[LI VFJ-IDNJ-P 

5 

[ 1] Chan Y.-L. Lin A., McNally J.. Peleg D.. Meyuhas O.. Wool I.G. 
J. Biol. Chem. 262:11 11 -1115(1987).[ 2] Hart K., Klein T. Wilcox M. 
Mech. Dev. 43:1 01 -110(1 993). [ 3] Singleton C.K.. Manning S.S., Ken R. 
Nucleic Acids Res. 17:9679-9692(1989). 
10 [1342] 506, Ribosomal protein Lie signature (RibosomaLL4) 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists [1,2,3. 4] of: - Vertebrate LI (L4). - Drosophila LI . - Plant LI. - Yeast L2 (Rp2). 

- Fission yeast L2. - Halobacterium marismortui HmaL4 (HL6). 
IS - Methanococcus jannaschit MJ01 77. 

These proteins have 246 (archaebacteria) to 427 (human) amino acids. A consented region in the N4erminal part of 
these proteins has been selected as a signature pattern. 

20 - Consensus pattern: N-x(3)4KRM]-x(2)-A4UVT|-x-S-A-[LIVl-x-A4STHSGAhx(7)-[RKh[GSJ-H 

[ 1] Rafti R. Gargiulo G.. Manzl A., Matva C, Graziani F. 
Nucleic Adds Res. 17:456-466(1 989).[ 2] Presutti C, Villa T. Bozzoni I. 
Nucleic Acids Res. 21:3900-3900(1993). 
2S [3]BagniC., MariottiniP, AnnesiR, AmaldiR 

Biochim. Biophys. Acta 1216:475-478(1993). 

[ 3] Amdt E., Kroemer W., Hatakeyama T J. Biol. Chem. 265:3034-3039(1990). 

[1 343] 507. Ribosomal protein L2 signature 
30 Ribosomal protein l_2 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L2 is known to bind 
to the 23S rRNA and to have peptidyltransferase activity. It betongs to a family of ribosomal proteins which, on the 
basis of sequence similarities [1 ,2], groups: - Eubacterial 12. 

- Algal and plant chloroplast L2. - Cyanelle L2. - Archaebacterial L2. 
3S - Plant L2. - Slime mold l_2. - Marchantia polymorpha mitochondrial L2. 

- Paramecium tetrauretia mitochondrial L2. - Fission yeast K5, K37 and KD4. 

- Yeast YL6. - Vertebrate LB. 

The best consen/ed region located in the C-terminal section of these proteins has been selected as 
40 a signature pattern. 

- Consensus pattern: P-x(2)-R-G4STAIVl(2)-x-N-{APK]-x-[DEJ 

[1] Marty I., Meyer Y 

4S Nucleic Acids Res. 20: 1 51 7-1 522(1 992). 

1 2] Otaka E., Hashimoto T, Mizuta K., Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 

[1344] 508. Ribosomal protein L20 signature 
so Ribosomal protein L20 is one of the proteins from the large ribosonr^l subunlt. In Escherichia coli, L20 is known to bind 
directly to the 23S rRNA. It betongs to a family of ribosomal proteins which, on the basis of sequence similarities [1 J. 
groups: - Eubacterial L20. - Algal and plant chtoroplast L20. 

- Cyanelle L20. 

55 

L20 is a protein of about 120 aminonacid resklues. A consented regton located in the central sect ton of these proteins 
has been selected as a signature pattern. 
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- Consensus pattern: K-x(3)-[KRC]-x-[LIVMl-W-[IVHSTNALV]-R-[UVM]-[NS]-x(3)4RKHS] 

[ 1] Otaka E„ Hashimoto T, Mizuta K.. Suzuki K. 
Protein Seq. Data Anal 5:301-313(1993). 
5 [1345] 509. Ribosonnal protein L21e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian L21 [1J. - Entamoeba histolytca L21 [2]. 

10 - Caenorhabditis elegans L21 (C14B9.7). - Yeast L21E (URP1) [3]. 
Halobacterium marismortui HL31 [4]. 

These proteins have 160 (eukaryotes) or 95 (archebacteria) amino-ackJ residues. A conserved regnn in the central 
part of these proteins has been selected as a signature pattem. 

15 

- Consensus pattem: G-{DE]-x-V-x(1 0)4Gyi-x(2)-[F YH]-x{2HFY]-x-G-x-T-G 

[ 1] Devi K.R.G.. Chan Y-L. \Atool I.G. 
Biochem. Bbphys. Res. Commun. 162:364-370(1989). 
^ [ 2] Petter R., Rozenblatt S., Nuchamowitz Y, Mirelman D. 

Mol. Biochem. Parasitol. 56:329-333(1992). 

[ 3] Jank B., Wakiherr M., Schweyen R.J. Curr. Genet. 23:15-18(1993). 
[ 4] HatakeyamaX. Kimura M. Eur. J. Bk)chem. 172:703-711(1988). 

25 [1346] 510. Ribosomal protein L21 signature 

Ribosomal protein L21 is one of the proteins from the large ribosomal subunit. In Escherichia coli. L21 is known to bind 
to the 23S rRNA in the presence of L20. It bek>ngs to a family of rtoosomal proteins whfch. on the basis of sequence 
similarities, groups: - Eubacterial L21. 

30 - Marchantia polynrwrpha chtoroplast L21 . - Cyanelle L21 . 
Spinach chtoroplast L21 (nuclear-encoded). 

Eubacterial L21 is a protein of about 100 amino^cW residues, the mature form of the spinach chtoroplast L21 has 200 
residues. A consen/ed regton tocated in the C-terminal section of these proteins has been selected as a signature 
3S pattem. 

Consensus pattem: [IVT]-x(3)-[KR]-x(3)-(KRQ]-K-x(6)-G-[HFhR-[RQ}.x(2)-[ST] 

[1 347] 51 1 . Ribosomal protein L22 signature 
40 Ribosomal protein 1.22 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L22 is known to bind 
23S rRNA. It betongs to a family of ribosomal proteins which, on the basis of sequence similarities [1 .2,3], groups: - 
Eubacterial L22. 

- Algal and plant chtoroplast L22 (in legumes L22 is encoded in the nuctous instead of the chtoroplast). - Cyanelle 
45 L22. - Archaebacterial L22. 

- Mammalian LI 7. - Plant L17. - Yeast YL1 7. 

A conserved regton tocated in the C- temiinal sectton of these proteins has been selected as a signature pattem. 
so - Consensus pattem: [RKQNJ-x(4)-{RH]-[GASl-x-G-(KRQS]-x(9)4HDN]-[LIVMl-x-[UVMS]-x-[LIVM] 
[ 1 J Gantt J.S., BaWauf S.L. Calie PJ.. Weeden N.R, Palmer J.D. 

EMBO J. 10:3073-3078(1991).! 2] Madsen LH.. Kreiberg J.D.. Gausing K. Curr. Genet. 19:417-422(1991). 
[ 3] Otaka E.. Hashimoto T, Mizuta K.. Suzuki K. 
Protein Seq, Data Anal. 5:301-313(1993). 

[1 348] 51 2. Ribosomal prc^ein L23 signature 

Ribosomal protein L23 is one of the proteins from the large ribosomal subunit. In Escherichia coli, 1.23 is known to bind 
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a specific region on the 23S rRNA; in yeast, the corresponding protein binds to a homologous site on the 26S rRNA 
[1]. It bebngs to a family of ribosomal proteins which, on the basis of sequence similarities [2,3,4], groups: - Eubacterial 
L23. 

s - Algal and plant chloroplast L23. - Archaebacterial L23. - Mammalian L23A. 

- Caenorhabditis elegans L23A (F55D10.2). - Fungi L25. 

- Yeast mitochondrial YmL41 (gene MRPL41 or MRP20). 

[1349] A small conserved region in the C-temninal section of these proteins, which is probably involved in rRNA- 
10 binding has been selected as a signature pattern [2]. 

- Consensus pattern: [RK)(2HA^fl^lVFYTHIVHRKT^L^STANEQK]-x(7HLIVM^ 

[ 1] El Baradi TTA.L, Raue H.A.. van de Regt C.H.R, Verbree E.G., 
IS Planta R J. EMBO J. 4:210-2107(1985). 

[ 2] Raue HA, Otaka E., Suzuki K. J. Mol. Evol. 28:418-426(1989). 
[ 3] Fearon K.. Mason T.L. J. Biol. Chem, 267:5162-5170(1992). 
( 4] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993), 

20 

[1 350] 513. Ribosomal protein L24 signature 

Ribosomal protein L24 is one of the proteins from the large ribosomal subunrt. L24 bek)ngs to a family of ribosomal 
proteins whk:h, on the basis of sequence similarities, groups: - Eubacterial L24. 

- Plant chloroplast L24 (nuclear-encoded). - Red algal L24. - Vertebrate L26. 

- Yeast L26 (YL33). - Archaebacterial HmaL24 (HL15). 

- A probable ribosomal protein from Sutfobbus acidocakiarius [1]. 

In their mature form, these proteins have 103 to 150 amino-acid residues. 
30 A consented stretch of 20 residues in their N-termlnal sectton has been selected as a signature pattern. 

- Consensus pattern: [GDEN^D-x-V-x-(lyHUVMA]-x-G-x(2)-[KRA]-[GNQ]-x(2.3)^GAJ-x-[IVl 

[ 1] Ouzounis C, Kyrpides N., Sander C. 
35 Nucleic Acids Res. 23:565-570(1 995). 

[1351] 514. Ribosomal protein l-24e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists [1] of: 

40 - Mammalian ribosomal protein L24. 

- Yeast ribosonrel protein L30A/B (Rp29) (YL21 ). 
Kluyveromyces iactis ribosomal protein L30. 

- Arabkiopsis thaliana ribosomal protein L24 homok)g. 
Haloarcula marismortui ribosomal protein HL21/HL22. 

^ - Methanococcus jannaschii MJ1201 . 

These proteins have 60 to 160 aminonacki resWues. The most conserved region, whrch is kx:ated in the N4erminal 
region of these proteins has been selected as a signature pattern. 

so - Consensus pattem: [FY}-x-[GSHJ-x(2)-IIV]-x-P-G-x-G-x(2)-IFYVl-x-[KRHE]-x-D 

[ 1] Chan Y.-L, Olvera J.. Wool I.G. Biochem. Bwphys. Res. Commun. 202:1176-1180(1994). 
[1 352] 51 5. Ribosomal protein L27 signature 

Ribosomal protein L.27 is one of the proteins from the large ribosomal subunit. L27 betongs to a family of ribosomal 
ss proteins whk^h, on the basis of sequence similarities [1 ,2], groups: - Eubacterial L27. 

- Plant chtoroplast L27 (nudear-encoded). - Algal chloroplast L27. 

- Yeast mitochondrial YmL2 (gene MRPL2 or MRP7). 
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The schematic relationship between these groups of proteins is shown below. Eub. L27 NxxxxxxxxxAlgal L27 
Nxxxxxxxxx 

Plant L27 tttttNxxxxxxxxxxxxx 

Yeast MRP7 tttNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
s ***X: transit peptide. 

'N': N-terminal of mature protein.**: position of the pattern. 

- Consensus pattern: G-x-[UVM](2)-x-R-Q-R-G-x{5)-G 

10 [1] Elhag G.A, Bourque D.R Biochemistry 31:6856^64(1992). 

[ 2] Otaka E.. Hashimoto T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1 353] 51 6. Ribosomal L28 family 

IS The rtoosomal 28 family includes L28 proteins from bacteria and chloroplasts. The L24 protein from yeast Swiss: 
P36525 also contains a region of similarity to prokaryotic L28 proteins. L24 from yeast is also found in the large ribos- 
omal subunit 
Number of members: 24 
[1354] 517. Ribosomal protein L29 signature 

20 Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. L29 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L29. - Red algal L29. 

- Archaebacterial 1^9. - Mannmalian L35 - Caenorhabditis elegans L35 (ZK652.4). 

- Yeast L35. 

25 

L29 is a protein of 63 to 1 38 amino-acid residues. 

A consenred region located in the central section of L29 has been selected as a signature pattern. 

- Consensus pattern: [KNQSJ-[PSTLl-x(2)H[LIMFA]-[KRGSAN]-x-[LIVYSTA]-lKR]-[KRHQS]-[DESTANRL]-[LIV]-A- 
30 [KRCQVTHUVMA] 

[ 1] Otaka E.. Kiashlmoto T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1 355] 518. Ribosomal protein L3 signature 
35 Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L3 is known to bind 
to the 23S rRNA and may partcipate in the fomiation of the peptidyltransferase center of the ribosome. It belongs to 
a family of ribosomal proteins whk:h, on the basis of sequence similarities [1.2,3,4], groups: - Eubacterial L3. - Red 
algal L3. - Cyanelle L3. 

^ - Archaebacterial Hatobacterium marismortui HmaL3 (HL1 ). 

Yeast L3 (also known as trk:hodermin resistance protein) (gene TCM1 ). 

- Arabldopsis thaliana L3 (genes ARP1 and ARP2). - Mammalian L3 (L4). 

- Mammalian mitochondrial L3. - Yeast mitochondrial YmL9 (gene MRPL9). A consented region located in the central 
sectkin of these proteins has been selected as a signature pattem. 

4S - Consensus pattern: [FL]-x(6)-[DhJ]-x(2)-[A6Shx-[ST|-x-G-[KRH]-G-x(2)-G-x(3)-R 

[ 1) Amcrt E., Kroemer W.. Hatakeyama T J. Biol. Chem. 265:3034-3039(1 990). 
[ 2] Graack H.-R.. Grohmann Kitakawa M.. Schaefer K.L. Kruft V. 
Eur. J. Bkx^hem. 206:373-380(1992). 
so 1 3] Henwg S.. Kruft V, Wittmann-Liebold B. 

Eur. J. Biochem. 207:877-885(1992). 
[ 4] Otaka E., Hashimc^o T, Mizuta K., Suzuki K. 
Protein Seq. Data Anal. 5:301-313(1993). 

ss [1 356] 51 9. Ribosomal protein L30 signature 

Ribosomal protein L30 is one of the proteins from the large ribosomal subunit. L30 betongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L30. - Archaebacterial L30. 
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- Drosophila L7. - Slime mold L7. - Mammalian L7. - Fungi L7 (YL8). 

- Yeast mitochondrial L33. 

L30 from eubacteria are small proteins of about 60 residues, those from archaebacteria are proteins of about 150 
s residues. Eukaryotic L7 are proteins of about 250 to 270 residues. The schematic relationship between the three groups 
of proteins is shown bebw.Eub. L30 NxxxxxxxxxxC 
Arc. L30 NxxxxxxxxxxxxxxxxxxxxxxxxxxxC 

Euk. L7 NxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxC position of the pattern. 

The signature pattem for this family of ribosomal proteins spans the N-terminal half of the region common to all these 
10 proteins. 

- Consensus pattem: [IVTh[UVMhx(2HLF]-x4LIJ-x-[KRHQEG]-x(2HSTNQHhx4IVThx(10)4LM 
VAhx(2HLMFYHIVT] 

IS [ 1 ] Mizuta K., Hashimoto T , Otaka E . 
Nucleic Acids Res. 20:1011-1016(1992). 
[1357] 520. Ribosomal protein L31 signature 

Ribosomal protein LSI is one of the proteins from the large ribosomal subunit. L31 is a protein of 66 to 97 amino-acid 
residues which has only been found so far in eut)acteria and in some algal chloroplasts. 
20 A consented region located in the central section of these proteins has been selected as a signature pattem. 

- Consensus pattem- H-P-F-{FYl-rnpx(9)-G-R-[AIVhx-[KRQ] 

[1 358] 521 . Ribosomal protein L31 e signature 
25 A number of eukaryotic and archaebacteria! ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammalian L31 [1]. - Chlamydomonas reinhardtii L31 . - Yeast L34. 
Halobacterium nnarismortui HL30 [2]. 

30 

These proteins have 87 to 128 amirio-ackl reskiues. 

A consented region, located in the central section has been selected as a signature pattem. 

- Consensus pattem: V-[KRHLIVM]-x(3)-[LIVM]-N-x-[AKH]-x-W-x-[KR]-G 

3S 

1 1] Tanaka T, Kuwano Y, Kuzumaki T, Ishikawa K.. Ogata K. Eur. J. Biochem. 162:45-48(1987).[ 2] Bergmann U.. 
Amdt E. 

Bkx;him. Bk)phys. Acta 1050:56-60(1990). 
[1359] 522. Ribosomal protein L33 signature 
40 Ribosomal protein L33 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L33 has been shown 
to be on the surface of 50S subunit L33 belongs to a family of ribosomal proteins wh«h. on the basis of sequence 
similarities [1.2.3], groups: - Eubacterial L33. 

- Algal and plant chbroplast L33. - Cyanelle L33. 

45 

L33 is a small protein of 49 to 66 amino-acid residues. A consen/ed regbn located in the central section of L33 has 
been selected as a signature pattem. 

. Consensus pattem: Y-x-[ST]-x-[KR]-[NS]-x(4)-[PATQ]-x(1 .2)-[LIVM]-[EA]-x(2)-K-[FY]-[CSD] 

so 

1 1] Kruft v., Kapp U., Wittmann-LieboW B. Bkx:himie 73:855-860(1991). 
[ 2] Sharp RM Gene 139:129-130(1994). 
[ 3] Otaka E.. Hashimoto T, Mizuta K. 
Proteh Seq. Data Anal. 5:285-300(1993). 

55 

[1360] 523. Ribosomal protein L34 signature 

[1361] Ribosomal protein L34 is one of the proteins from the large subunit of the prokaryotic ribosome. It is a small 
basic protein of 44 to 51 amino-acid residues (H L34 betongs to a family of ribosomal proteins whfch, on the basis of 
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sequence similarities, groups: - Eubacterial L34. 

- Red algal chloroplast Ld4. > Cyanelle L34. 

5 A conserved region that corresponds to the N-tenminal half of L34 has been selected as a signature pattern. 

- Consensus pattern: K-[RGhT-{FYWLHEQS]-x(5HKRHS}-x(4,5)-G-F-x(2)-R 

[ 1] OkJ I.G., Margarita D.. Saint Girons 1. 
10 Nucleic Acids Res. 20:6097-6097(1 992). 

[1362] 524. Ribosonrial protein L34e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families cortsists of: 

IS - Mammalian L34. - Mosquito L31 [1]. - Plant L34 [2]. 

- Yeast putative ribosomal protein YIL052c. - Methanococcus jannaschii MJ0655. These proteins have 89 to 129 
ambK>-acrd residues. 

A consen/ed region located in the N-terminal section of these proteins has been selected as a signature pattern. 

20 

- Consensus pattern: Y-x-{S"!>x-S-{IMY]-x(5HKR]-T-P-G 

[ 1] Lan Q., Niu LL. Falbn A M. 
Biochim. Biophys. Acta 1218:460-462(1994). 
2S [ 2] Gao J., Kim S.R., Chung Y.Y. Lee J.M., An G. 

Plant Mol. Biol. 25:761-770(1994). 

[1363] 525. Ribosomal protein L35Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
30 One of these families consists of: 

- Vertebrate L35A. - Caenorhabditis elegans L35A (F10E7.7). 

- Yeast L37A/L37B (Rp47). - Pyrococcus woesel L35A homolog [1 ]. 

35 These proteins have 87 to 11 0 amino-acid residues. 

A highly conserved stretch of 22 residues in the C-tenminal part of these proteins has been selected as a signature 
pattern. 

- Consensus pattern: G-K-[U VM}-x-FVx-H-G-x(2)-G-x-V-x-A-x-F-x(3)-[LI]-P 

40 

1 1] Ouzounis C. Kyrpides N., Sander C. 
Nucleic Acids Res. 23:565-570(1995). 
[1364] 526. Ribosomal protein L36 signature 

Ribosomal protein L36 b the smallest protein from the large subunil of the prokaryotic ribosome. It betongs to a family . 
45 of ribosomal proteins whk:h, on the basis of sequerK:e similarities [1], groups: - Eubacterial L36. - Algal and plant 
chk>roplast L36. - Cyanelle L36.L36 is a small bask; and cysteine-rk:h protein of 37 amino-ackl residues. As a signature 
pattem, a consented regkxi that corresponds to positions 1 1 to 36 In L36 and includes three conserved cysteine resklues 
has been devek)ped. 

Consensus pattem: C-x(2)-C-x(2)4LIVM]-x-R-x(3)-IUVMNhx4LIVMhx-C-x(3,4)-[^^^ 
so [1] otaka E., Hashimoto T, Mizuta K. Protein Seq. Data Anal. 5:^-300(1993). 
[1365] 527. Ribosomal protein L36e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian L36 [1]. 

55 - Drosophila L36 (M(1 )1 B). - Caenorhabditis elegans L36 (F37C1 2.4). 

- CandkJa albcans L39. - Yeast YL39. 

These proteins have 99 to 104 amino acids. 
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A conserved region in the central part of these proteins has been selected as a signature pattern. 

- Consensus pattern: P-Y-E-[KR]-R-x-[LIVM]-[DEHUVM](2HKR] 

5 [ 1] Chan Y.-L, Paz V, Olvera J.. Wbol I.G. 

Biochem. Biophys. Res. Commun. 192:849-853(1993). 
[1366] 528. Ribosomal protein L39e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

10 

- Mammalian L39 [1]. - Plants L39. - Yeast L46 [2]. - Archebacterial L39e [3J. These proteins are very basic. About 
50 residues bng, they are the smallest proteins of eukaryotic-type ribosonrms. A conserved region in the C-terminal 
section of these proteins has been selected as a signature pattem. 

- Consensus pattem: [KRA]-T-x(3)-[UVM]-[KRQFT-x-{NHS]-x(3)-R-[NHY]-W-R-R 

IS 

1 1] Lin A.. McNatly J.. Wool I.G. J. Biol. Chem. 259:487-490(1984). 
[ 2] Leer R.J.. van Raamsdonk-Duin M.M.C.. Kraakman P., Mager W.H.. 
Planta R.J. Nucleic Acids Res, 13:701-709(1985). 

[ 3] Ramirez C, Louie KA, Malheson A.T FEBS Lett. 250:416-418(1989). 

20 

[1367] 529. Ribosomal L40e family 

Bovine L40 has been identified as a secondary RNA binding protein [1]. L40 is fused to a ubiquitin protein [2]. 
Number of members: 27 
[1] 

2S Medline: 88203200 

RNA binding proteins of the large subunit of bovine mitochondrial ribosomes. 
Piatyszek MA. Denslow ND, O'Brien TW; 
Nucleic Acids Res 1988;16:2565-2583. 

(2]Medline: 9601 1 832 The carboxyl extensions of two rat ubiquitin fusion proteins are ribosomal proteins S27a and 
30 L40. 

Chan YL, Suzuki K, Wool IG; 
Biochem Biophys Res Commun 1995;215:682-690. 
[1368] 530. (Ribosomal L44) Ribosomal protein L44e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
3S One of these families consists of: 

Mammalian L44 [1]. - Trypanosoma brucei L44. 

- Caenorhabditis elegans L44 (C09H10.2). - Fungal L44 (L41). 
Hatobacterium marismortui LA [2]. 

40 

These proteins have 92 to 105 amino-acki reskiues. 

A conserved regkxi kx:ated in the C^erminal part of these proteins has been selected as a signature pattern. 

- Consensus pattem: K-x-rTV]-K-K-x(2)-L-[KRl-x(2)-C 

45 

1 1] Gallagher M.J., Chan Y-L, Un A., Wool I.G. DNA 7:269-273(1 988). 
[ 2] Bergmann U.. Wittmann-LiebokJ B. 
Biochim. Biophys. Acta 1173:195-200(1993 

so [1 369] 531 . Ribosomal prc^ein L5 signature 

Ribosomal protein L5 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L5 is known to be 
involved in binding 5S RNA to the large ribosomal subunit. It bek)ngs to a family of ribosomal proteins which, on the 
basis of sequence smnilarrties [1 ,2,3,4]. groups: - Eubacterial L5. 

55 - Algal chloroplast LS. - Cyanelle L5. - Archaebacterial L5. - Mammalian L11 . 

- Tetrahymena thermophila L21. - Slime moW L5 (VI 8). - Yeast LI 6 (39A). 

- Plants mitochondrial L5. 
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L5 is a protein of about 180 amino-acid residues. 

A conserved region, located in the first third of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVMhx(2HLIVM]^STAVCHGEHQV]-x(2HLI VMA]-x-[STC]-x-[STAG]4KRH)-x4STA] 

5 

[ 1] HatakeyamaT, Hatakeyama T. Biochim. Biophys. Acta 1039:343-347(1990). 

[ 2] Ftosendahl G., Andreasen RH., Kristiansen K. Gene 98:161-167(1991). 

[ 3] Yang D.. Gunther I., Matheson A.T, Auer J., Spicker G.. Boeck A. Biochimie 73:679-682(1991). 

[ 4] Otaka E., Hashimoto T, Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301 -31 3(1 993). 

10 

[1370] 532. ribosomal L5P family C-terminus 

[1371] This region is found associated with RibosomaLL5. Number of members: 60 
[1372] 533. Ribosomal protein L6 signatures 

[1 373] Ribosonnal protein L6 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L6 is known 
IS to bind directly to the 23S riRNA and is kxated at the aminoacyl-tRNA binding site of the peptidyttransferase center. It 
belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1 ,2,3.4]. groups: - Eubacterial L6. 

Algal chloroplast L6. 
Cyanelle L6. 
20 - Archaebacterial L6. 

Marchantia polymorpha mitochondrial L6. 

- Yeast mitochondrial YmL6 (gene MRPL6). 
Mammalian L9. 

Drosophila L9. 
2S - Plants L9. 

- Yeast L9(YL11). 

[1374] While all the above proteins are evolutkxiary related it is very drfficult to derive a pattern that will find them 
ail. Two pattems were therefore created, the first to detect eubacterial, cyanelle and mitochondrial L6, the second to 
30 detect archaebacterial L6 as well as eukaryotic L9. 

- Consensus pattern: (PSHDENSJ-x-Y-K-[GAhK-G-[U VM] 

- Consensus pattern: Q-x(3)-[UVMhx(2>iKR]-x{2)-R-x-F-x-D-G-[UVM]-Y-lLIVM]-x(2)4KR] 

3S [1] Suzuki K., Olvera J., Wbol I.G. Gene 93:297-300(1990). 

[2] Schwank S.. Harrer R., Schueiler H.-J., Schweizer E. Curr. Genet. 24:136-140(1993). 

[3] GoMen B.L. Ramakrishnan V, White S.W. EMBO J. 12:4901-4908(1993). 

[4] Otaka E., Hashimoto T, Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

40 [1375] 534. Ribosomal protein L6e signature 

A number of eukaryotk: and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Mammafian ribosomal protein Lj6 (L6 was prevtously known as TAX-responsive enhancer element binding protein 

45 107). 

- Caenorhabdrtis elegans ribosomal protein L6 (R1 51 .3). 

- Yeast ribosomal protein YL1 6A/YL1 6B. 

- Mesembryanthemum crystallinum ribosomal protein YL1 6-like. 

so These proteins have 175 (yeast) to 287 (mammalian) amino acids. A highly consented regton in the central part of 
these proteins has been selected as a signature pattern. 

- Consensus pattern: N-x(2)-P-L-R-R-x(4)-[I^V-l-A-T-S-x-K 

ss [1376] 535. Ribosomal protein L7Ae signature 

A number of eukaryotk: and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 
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- Vertebrate L7A {SURF3) {1 J. - Plant L7A. - Yeast L7A (YL5) (Rp6). 

- Yeast protein NHP2 [2]. - Yeast hypothetical protein YEL026W. 

- Bacillus subtilis hypothetical protein ylxO. - Halobacterium marismortui Hs6. 

- Methanococcus jannaschii MJ 1 203. 

5 

[1 377] These proteins have 1 00 to 265 amino-acid residues. 

A consented region located in the central section has been selected as a signature pattern. 

- Consensus pattern: [CAl-x(4HlV]-P-lFY]-x(2)-[LIVM]-x-[GSQ]-[KRQJ-x(2)-L-G 

10 

1 1] Colombo P.. Yon J.. Garson K., Fried M. Proc. Natl. Acad. Sci. U.S.A. 89:6358-6362(1992). 
[ 2] Kolodrubetz D.. Burgum A. Yeast 7:79-90(1991). 

[1378] 536. Ribosomal protein L9 signature 
IS Ribosomal protein L9 ^ one of the proteins from the large ribosomal subunit. In Escherichia coll, L9 is known to bind 
directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1 ,2], 
groups: - Eubacterial L9. > Cyanobacterial L9. 

Plant chlofoplast L9 (nuclear-encoded). - Red ai^l chloroplast L9. 

20 

A consented region, located in the N-termtnal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: G-x(2)-IGN]-x(4)-V-x(2)-G-[FY]-x(2)-N4FYI-L-x(5)-[GA]-x(3)-ISTN] 

2S [1] Hoffman D.W., Davies C. Gerchman S.E.. Kycia J.H., Porter S. J., White S.W., Ramakrishnan V. EMBO J. 1 3: 

205-212(1994). 

[ 2] Otaka E., hteshimoto T, Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301 -31 3(1993). 

[1379] 537. Ribosomal protein SIO signature 
30 Ribosomal protein 810 is one of the proteins from the small ribosomal subunit. In Escherichia coli, 810 is known to be 
involved in binding tRNA to the rftx)somes. It bekxigs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1]. groups: - Eubacterial SIO. 

- Algal chtoroplast SI 0. - Cyanelle 81 0. - Archaebacterial 81 0. 

35 - Marchantia polymorpha and Prototheca wickerhamii mitochondrial 810. 

- Arabidopsis thaliana mitochondrial SIO (nuclear encoded). - Vertebrate 820. 

- Plant S20.- Yeast URP2. 

SIO is a protein of about 100 aminonacid reskiues. 
40 [1 380| A consen/ed regkyi kx:ated tn the center of these proteins has been selected as a signature pattern. 

- Consensus pattern: [AVl-x(3)-[GDNSRHUVMSTAhx(3)-G-P-[UVM]-x-[U VM]-P-T 

[ 1] Otaka E.. Hashimoto T, Mizuta K. 
45 Protein Seq. Data Anal. 5:285-300(1993). 

[1381] 538. Ribosomal protein S11 signature 

Ribosomal protein 811 [1] plays an essential role in selecting the correct tRNA in protein biosynthesis. It is kx»ted on 
the large ktoe of the small ribosomal subunit. S11 belongs to a family of ribosomal proteins whk:h, on the basis of 
sequence similarities, groups [2]: - Eubacterial 811. 

50 

- Algal ana plant chtoroplast 811. - Cyanelle 811. - Archaebacterial 811. 

- Marchantia polymorpha and Prototheca wickerhamii mitochondrial 811 . 

- Acanthamoeba castellanii mitochondrial 811 . - Neurospora crassa S14 (crp-2). - Yeast 81 4 (RP59 or CRY1 ). 

- Mammalian. Drosophila. Trypanosoma, and plant 81 4. 
ss . Caenorhabditis elegans 814 (F37C12.9). 

One of the best consenred regkxis in these proteins was selected as a signature pattem. 
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- Ck>nsensus pattern: [UVMF]-x4GSTACHUVMFl-x(2HGSTAL]-x(0 J 
X4PAHSTCHHDNJ 

( 1] Kimura M., Kimura J.. Hatakeyama T FEBS Lett. 240:15-20(1988). 
5 [ 2] Otaka E.. Hashimoto T, Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

[1382] 539. Ribosomal protein S12 signature 

Ribosomal protein SI 2 is one of the proteins from the small ribosomal subunit. In Escherichia coli, SI 2 is known to be 
10 involved in the translation initiation step. It is a very basic protein of 120 to 150 amino^cid reskJues. 812 belongs to 
a family of ribosomal proteins whk^h, on the basis of sequence similarities (1], groups: - Eubacterlal 812. - Archaebac- 
terial 812. 

- Algal and plant chtoroplast S12. - Cyanelle S12. 

IS - Protozoa and plant mitochondrial SI 2. - Yeast S28. 

- Drosophila mitochondrial protein tko (Technical KnockOut). - I^Aammalian 823. The best consented regbns in these 
proteins, located in the center of each sequence have been selected as a signature pattem. 

- Consensus pattem: [RKhx-P-N-8-[AR]-x-R 

20 [ 1] otaka E., Hashimoto T. Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 
[1383] 540. Rtoosomal protein 81 2e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis df sequence similarities. One of these families 
consists of: - Vertebrate S12 [1]. 

25 

' Trypanosoma brucei 812 [2]. - Caenorhabditis elegans 81 2 (F54E7.2). 

- Drosophila 812. - Yeast 812. 

These proteins have 1 30 to 1 50 amino acids. 
30 A consented region in the N^enminal part of these proteins has been selected as a signature pattem. 

- Consensus pattem: A-L-[KRQP]-x-V-L-x(2)-[8A]-x(3)-[DN]-G-L 

[ 1] Lin A., Chan Y-L, Jones R, Wool LG. 
3S J. Bk>l. Chem. 262:1 4343-14351 (1987).[ 2] Marchal C. Ismalli N., Pays E. Mol. Bkx:hem. Parasrtol. 57:331-334(1993). 
[1 384] 541 . Ribosomal protein 81 3 signature 

Ribosomal protein 813 is one of the proteins from the small ribosomal subunit. In Escherwhia coli, 81 3 is known to be 
involved in binding fMet-^RNA and. hence, in the initiation of translation. It is a basic protein of 115 to 177 amino-acid 
residues and betongs to a family of ribosomal proteins which, on the basis of sequence similarities [1.2], groups: - 
40 EubacterialSia 

- Plant chkxoplast 813 (nuclear encoded). - Red algal chbroplast 81 3. 

- Cyanelle 81 3. - Archaebacterial 813. - Plant mitochondrial 81 3. 
Mammalian and plant 818. 

45 

The best conserved regions in these proteins, bcated in their C-terminal part have been selected as a signature pattern. 

- Consensus pattem: [KRQ8J-G-x-R-H-x(2)-[GSNH]-x(2)-(LIVMC]-R-G-Q 

so 1 1] Chan Y-L. Paz V. Wbol I.G. 

Blochem. Bbphys. Res. Commun. 178:1212-1218(1991). 
[ 2] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

ss [1385] 542. Ribosomal protein S14p/S29e (Ribosomal protein 814 signature) 

[1388] Ribosomal protein 814 is one of the proteins from the small ribosomal subunit. In Escherk:hia coli. 814 is 
known to be required for the assembly of 308 partbles and may also be responsible for determining the conformation 
of 1 68 rRNA at the A site. \X betongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2], 
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groups: 

Eubacterial SI 4. 
Algal and plant chloroplast S1 4. 
5 - CyanelleS14. 

Archaebacterial Methanococcus vannielii SI 4. 
Plant mitochondrial S14. 

- Yeast mrlochondrial MRP2. 
Mammalian S29. 

10 - Yeast YS29A/B. 

[1387] S14 is a protein of 53 to 115 amino-acid residues. Our signature pattern is based on the few consented 
positions located in the center of these proteins. 

[1388] Consensus pattern: [RP]-x(0,1)-C-x(11,12HUVMFJ-x-[LIVMF]-[SCHRG]-x(3)-[RN] 

IS 

[1] Chan Y-L, Suzuki K.. Olvera J., Wool I.G. Nucleic Acids Res. 21:649-655(1993). 
[2] Otaka E., Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[1389] 543. Ribosomal protein S15 signature 
20 Ribosomal protein SI 5 is one of the proteins from the small ribosomal subunit. In Escherichia coli, this protein binds 
to 16S rbosomal RNA and furictions at early steps in ribosome assembly. It bebngs to a family of ribosomal proteins 
whk:h, on the basis of sequence similarities [1,2], groups: - Eubacterial 815. 

- Archaebacterial Hatobacterium marismortui HmaS15 (HS11). 

2S . Plant chloroplast SI 5. - Yeast mitochondrial S28. - Mammalian S1 3 

- Brugia pahangi and Wuchereria bancrofti S13 (S15). - Yeast S13 (YS15). 

815 is a protein of 80 to 250 amino-acid residues. 

A conserved region kx:ated in the C-terminal part of these proteins has been selected as a signature pattern. 

30 

- Consensus pattern: [UVM]-x(2>-H-(LIVMFYl-x(5)-D-x(2)4SAGNhx(3)-[LF]-x(9)-[LIVM]-x(2)-[FY] 

[ 1] Dang H.. Ellis S R. 
Nuctek: Acids Res. 18:6895-6901(1990). 
3S [ 2] Otaka E., Hashrmc^o T, Mizuta K. 

Protein Seq. Data Anal. 5:285-300(1993). 

[1390] 544. Ribosomal protein 816 signature 

[1 391] Ribosomal protein 81 6 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribos- 
40 omal proteins whk;h, on the basis of sequence similarities [1], groups: 

- Eubacterial SI 6. 

- Algal and plant chtoroplast SI 6. 

- Cyanelle S16. 

45 - Neurospora crassa mitochondrial 824 (cyt-21 ). 

[1392] SIS is a protein of about 100 amino-ackJ rescues. A consented regkxi kxaXed in the N-terminal extremity of 
these proteins has been selected as a signature pattern. 
[1 393] Consensus pattern: (LI VMT>x-[LI VMHKR]-L-{STAK]-R-x-G-[ AKR] 
SO [1394] [1] Otaka E., Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
[1395] 545. Rit)osomal protein SI 7 signature 

Ribosomal protein S17 is one of the proteins from the small ribosomal subunit. In Escherichia coli, 817 is known to 
bind specifically to the 5'end of 168 ribosomal RNA and is thought to be involved in the recognition of tenmination 
codons. It betongs to a family of ribosomal proteins which, on the basis of sequence similarities [1,2.3], groups: - 
ss Eubacterial 817. 

- Plant chbroptast 81 7 (nuclear encoded). - Red algal chtoroplast 81 7. 

- Cyanelle 817. - Archaebacterial 817. - Mammalian and plant cytoplasmic 811 . 
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- Yeast SIBa and S18b (RP41 ; YS12). 

The best conserved regions located in the Otemriinal sections di these proteins have been selected as a signature 
pattern. 

5 

- Consensus pattern: G-D-x^UV]-x-[LIVA)-x^QEKI-x^RKJ-P-[UV]-S 

[ 1] Gantt J.S., Thompson M.D. J. Biol. Chem. 265:2763-2767(1990). 
[ 2] Herfurth E., Hirano H., Wittmann-Ljebold B. 
10 BloL Chem. Hoppe-Seyler 372:955-961 (1991). 

[ 3] Otaka E., Hashimoto Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1398] 546. Ribosomal protein S17e signature 
IS A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Vertebrates S17 [1], - Drosophila S17 [2J. - Neurospora crassa S17 (crp-3). 

- Yeast SI 7a (RP51 A) and S17b (RP51B) [3]. - Methanococcus jannaschii MJ0245. These proteins have from 63 
20 (in archebacteria) to 1 30 to 1 46 amino acids and are highly consented. A region in the central part of these proteins 

has been selected as a signature. 

- Consensus pattern: A-x-l-x-[ST]-K-x-L-R-N-fKRhl-A-G-(FY]-x-T-H 

[ 1] Chen l.-T, Routa D.J. Gene 70:107-116(1988). 
2S [ 2] Maki C, Rhoads D.D., Stewart M.J., van Slyke B.. Denell R.E.. 

Roufa D.J. Gene 79:289-298(1 989).[ 3] Abovich N.. Rosbash M. 
Mol. Cell. Biol. 4:1871 -1879(1 984). 

[1 397] 547. Ribosomal protein SI 8 signature 

30 Ribosomal protein S18 is one of the proteins from the small ribosonral subunit. In Escherichia coli, S18 has been 
involved in amrnoacyMRIMA binding[1]. It appears to be situated at the tRNA A-site of the ribosome. It belongs to a 
family of ribosomal proteins which, on the basis of sequence similarities[2], groups: - Eubacterial SI 8. - Algal and plant 
chtoroplast S18. - Cyanelle S18.As a signature pattern, a conserved region in the central section of the protein has 
been selected. This regkm contains two basic resklues which may be involved In RNA-binding.- 

35 Consensus pattern: [IVl-[DY]-Y-x(2)-[UVMT]-x(2)-[LIVM]-x(2)-{FYT]-[LIVM]- [ST|-[DERPJ-x-[GY].K-[UVM]-x(3)-R- 
[LIVMAS]- 

[ 1] McDougall J., Choli T, Kruft V, Kapp U., Wittmann-Liebold B. FEBS Lett. 245:253-260(1 989).[ 2] Otaka E., Hash- 
imoto T. Mizuta K. Protein Seq. Data Anal. 5:^5-300(1993). 
[1 398] 548. Ribosonnal protein SI 9 signature 
^ Ribosomal protein S19 is one of the proteins from the small ribosomal subunit. In Escherichia coli, SI 9 is known to 
form a complex with SI 3 that binds strongly to 16S ribosomal RNA. SI 9 belongs to a family of ribosomal proteins 
which, on the basis of sequence smnitarities [1,2], groups: - Eubacterial SI 9. 

- Algal and plant chbroplast S19. - Cyanelle SI 9. - Archaebacterial SI 9. 
45 - Plant mitochondrial SI 9. - Eukaryotk: SI 5 frig' protein). 

819 is a protein of 88 to 144 aminonadd residues. Our signature pattem is based on the few conserved positions 
located in the C-terminal section of these proteins. 

so - Consensus pattern: [STDNQ]-G-[KRQM]-x(6)-[LIVM]-x(4)-[LI VM]-[GSD]-x(2)-[L^^ 

( 1] Kitagawa M., Takasawa S., Kikuchi N., Itoh T, Teraoka H., Yamamoto H., Okamoto H. FEBS Lett. 283:210-214 
(1991). ^ 

[ 2] Otaka E., Hashimoto T, Mizuta K. 
ss Protein Seq. Data Anal. 5:285-300(1 993). 

[1 399] 549. Ribosomal protein SI 9e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the t>asis of sequence similarities 
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[1,2]. One of these families consists of: - Mammalian S19. - Drosophila S19. 

- Ascaris lumbrlcoides S1 9g (ALEP-1 ) and SI 9s. - Yeast YS1 6 (RP55A and RP55B). 

- Aspergillus S16. - Halobacterium marismortui HS12. 

5 

These proteins have 143 to 155 amino acids. 

A well conserved stretch of 20 residues In the C-terminat part of these proteins has been selected as a signature pattern. 

- Consensus pattern: P-x(6)'[SANl-x(2HLIVMA]-x-R-x-[ALIVl-[LV]-Ox-L-[EQ] 

10 

[ 1] Etter A.. Aboutanos M.. Tobler H.. h4uelier F. 

Proc. Natl. Acad. Sci. U.S.A. 88:1593-1596(1991). 

[ 2] Suzuki K., Olvera J., Wbol I.G. Biochimte 72:299-302(1990). 

IS [1400] 550. Ribosomal protein 82 signatures 

Ribosomal protein S2 is one of the proteins from the small ribosomal subunit. S2 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1,2], groups: - Eubacterlal S2. - Algal and plant chloroplast S2. 

- Cyanelle S2. - Archaebacterial S2. 

so - Higher eukaryotes P40 (previously thought to be a lamrnin receptor). 

- Yeast N AB1 . - Plant mitochondrial S2. - Yeast mitochondrial MRP4. 

S2 is a protein of 235 to 394 amino-acid residues. 

Two conserved regions have been selected as signature pattems. One is located in the N-tenninal section and the 
2S other in the central section. 

- Consensus pattern: [UVMFA].x(2)4U VMFYC](2)-x-[STACHGSTANQEKR]-[STALVh 
[HYHLIVMFhG 

- Consensus pattem; P-x(2)^L^VMF](2>^LIVMS^x-[GDN]-x(3)-[DENL]-x(3)-lLIVM]-x-E-x(4)^GNQKRH]-[LIVM]- 

30 [AP] 

[ 1] Davis S.C., Tzagoloff A.. Ellis S.R. 
J. Biol. Chem. 267:5508-5514(1992). 

( 2] Tohgo A., TaKasawa S., Munakata H., Yonekura H., Hayashi N., Okanrtoto H. FEBS Lett. 340:133-138(1 994). 

35 

[1401] 551. Ribosomal protein 821 signature 

[1402] Ribosomal protein 821 is one of the proteins from the small ribosomal subunit. So far S21 has only been 
found in eubacterla. ft is a protein of 55 to 70 amino^kJ residues. A consented region In the N-terminal sectwn of the 
protein has been selected as a signature pattem. 
40 [1403] Consensus pattem: [DE]-x-A4LIY]-[KR)-R-F-K-[KRJ-x(3)-[KR] 
[1404] 552. Ribosomal protein 821 e signature 

A number of eukaryotc ribosonnal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian S21 [1]. 

45 - Caenorhabditis elegans 821 (F37C1 2. 1 1 ). - Rice 821 [2]. 

- Yeast 821 (Ys25) [3]. - Fission yeast 828 [4]. 

These proteins have 82 to 87 amino acids. 

A perfectly conserved nonapeptkie in the N-tenminal part of these proteins has been selected as a signature pattem. 

so 

- Consensus pattem: L-Y-V-P-R-K-C-848A] 

[ 1] Bhat K.8., Morrison S.G. NucleU: Ackls Res. 21:2939-2939(1993). 
[ 2] Nishi R., Hashimoto H., Uchimiya H., Kato A. 
ss Biochim. Bk)phys. Acta 1216:113-114(1993).[ 3] Suzuki K., Otaka E. Nucleic Acids Res. 16:6223-6223(1 988). [ 4] 

Itoh T, Okata E., Matsui K.A. Biochemistry 24:7418-7423(1985). 

[1405] 553. Ribosomal protein S24e signature 
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A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

Vertebrate S24 [1 J. - Yeast Rp50. - Mucor racemosus S24 [2]. 
s - Halobacterium marisrrortui HS1 5 [3]. - Methanocoocus jannaschii MJ0394. 
These proteins have 101 to 148 amino acids. 

A well consented stretch in the central part of these proteins has been selected as a signature pattem. 

10 - Consensus pattern: [FYAl-G-x(2HKR]-(STA]-x-G-[FYHGA]-x4LIVMJ-Y-{DN]-[SDN] 

[ 1] Brown S.J., Jewell A,, Maki C.G.. Roufa D.J. Gene 91:293-296(1990). 
[ 2] Sosa L, Fonzi W.A.. Sypherd RS. 

IS [1406] Nucleic Acids Res. 17:931 9-9331(1 989). [ 3] Kimura J.. Amdl E., Kimura M. FEBS Lett. 224:65-70(1987). 
[1407] 554. Ribosomal protein S26e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian S26 [1]. 

20 - Octopus S26 [21 - Drosophlla S26 (DS31 ) [3]. - Plant cytoplasmk: S26. 

- Fungi S26 [4]. 

These proteins have 114 to 127 amino acids. 

A consen/ed octapeptide in the central part of these proteins has been selected as a signature pattem. 

25 

- Consensus pattem: [YH]-C-V-S-C-A-I-H 

[ 1] Kuwano Y, Nakanlshi O., Nabeshima Y. Tanaka T, Ogata K. J. Biochem. 97:983-992(1 985).[ 2] Zinov*eva R. 
D., Tomarev S.I. Dokl. Akad. Nauk SSSR 304:464-469(1989). 
30 [ 3] Itch N., Ohta K.. Ohta M., Kawasaki T, Yamashina I. Nucleic AckJs Res. 17:2121 -2121 (1989).[ 4] Wu M.. Tan 

H. Gene 150:401-402(1994). 

[1408] 555. Ribosomal protein S28e signature 

A number of eukaryotk: and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarrties. 
3S One of these families consists of: 

- Mammalian S28 [1]. - Plant S28 [2]. - Fungi S33 [3]. 
MetharKxx)ccu5 jannaschii MJ1202. 

40 These proteins have from 64 to 78 amino ackis. 

A highly consen/ed nonapeptide from the C-terminal extremity of these proteins has been selected as a signature 
pattem. 

- Consensus pattem: E-fSTJ-E-R-E-A-R-x-L 

45 

[ 1] Chan Y-L. Olvera J., Wool I.G. 

Biochem. Bbphys. Res. Commun. 179:314-318(1991). 

[ 2] Hwang I., Goodman H.M. Plant Physk>l. 102:1357-1358(1993). 

[ 3] Hoekslra R.. Ferreira PM., Bootsman T.C.. Mager W.H., Planta R.J. Yeast 8:949-959(1992). 

so 

[1409] 556. Ribosomal protein S3Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

ss - Mammalian S3A (was originally known as v-fos transformation effector protein). - Caenorhabditis elegans S3A 
(F56Fa5). 

- Plant cytoplasmic S3A (CYG07) [1]. - Yeast RpIO (PLC1 and PLC2). 

- Fissnn yeast RpIO (SpACI 3G6.02c). - Methanococcus jannaschii MJ0980. 
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These proteins have from 220 to 250 amino acids. 

A conserved stretch in their N-termlnal section was selected as a signature pattern. 

- Consensus pattern: [UV]-x-[GH)-FHIV]-x-E-x-{SC)-L-x-D-L 

5 

[1]Liu J.H.. Reid D.M. 

Plant Physiol. 109:338-338(1995). 

[14101 557. Ribosonial protein S3 signature 

Ribosomal protein S3 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S3 is known to be 
10 involved in the binding of initiator Met-tRN A. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1], groups: - Eubacterial S3. 

- Algal and plant chloroplast S3. - Cyanelle S3. - Archaebacterial S3. 

- Plant mitochondrial S3. - Vertebrate S3. - Insect S3. 

IS . Caenorhabdftis elegans S3 (C23G1 0.3). - Yeast S3 (Rpl 3). 

53 is a protein of 209 to 559 amino-acid residues. 

A consented region located in the C4enDinal section has been selected as a signature pattern. 

20 - Consensus pattern: [GSTAHKRJ-x(6)-G-x4LI VMT]-x(2)4NQSCH]-x(1 .3)-[LI VFCA]-x(3)-(LI V]-IDENQ]-x(7)-[LMT]- 
x(2)-G-x(2)-G 

[ 1] Otaka E.. Hashimoto T., Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
2S [1411] 558. Ribosomal protein 84 signature 

Ribosomal protein S4 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S4 is known to bind 
directly to 16S ribosomal RN A. Mutations in S4 have been shown to increase translatlonal error frequencies. It bekxigs 
to a family of ribosomal proteins whk:h, on the basis of sequence similarities [1,2], groups: - Eubacterial S4. - Algal 
and plant chloroplast S4. 

30 

• Cyanelle S4. - Archaebacterial S4. - Mammalian S9. - Yeast YS11 (SUP45). 

- Marchantia polymorpha mitochondrial S4. - Dknyostelium discoideum rpl 024. 

- Yeast protein NAM9 [3]. NAM9 has been characterized as a suppressor for ochre mutations in mitochondrial DNA. 
It could be a ribosomal protein that acts as a suppressor by decreasing translatbn accuracy 

35 

54 is a protein of 171 to 205 aminonacid residues (except for NAM9 whk:h is much larger). The signature pattern for 
this protein Is based on a consen/ed regkxi kxated in the central sectbn of these proteins. 

- Consensus pattern: [LIVMJ^DE^x-R-[LI]-x(3)-[U VMC}iVMF^HQ>[KRT]-x(3)-[STAGC\^]-x-[ST]-x(3)-[SAI]^^^ 
40 X-IUVMF](2) 

( 1] Mizuta K., Hashimoto T, Suzuki K.I., Otaka E. Nuclefc Ackis Res. 19:2603-2608(1991). 
[ 2] Otaka E.. Hashimoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[ 3] Boguta M., Dmochowska A., Borsuk P, Wrobel K., Gargouri A., Lazowska J., Slonimski P, Szczesniak B., 
46 Kruszewska A. Mol. Cell. Bk)l. 12:402-412(1992). 

[1412] 559. Ribosomal protein S4e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

so 

- Mammalian S4 [1]. Two highly similar iscrforms of this protein exist : one coded by a gene on chromosome Y, and 
the other on chromosome X 

- Plant cytoplasmk: S4 [2] - Yeast S7 (YS6). - Archebacterial S4e. 

ss These proteins have 233 to 264 amino acids. 

A highly consen/ed stretch of 15 reskJues in their N-terminal section has been selected as a signature pattem. Four 
positk3ns in this regbn are positively charged reskJues. 
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- Consensus pattern: H-x-K-R-IUVMIT-[SANKJ-x-P-x(2)-[WY]-x4LIVM]-x-(KRP] 

[ 1] Fisher E.M.. Beer-Romero R. Brown LG., Ridley A.. McNeil J.A., Lawrence J.B.. Willaid H.F., Bieber F.R., 
Page D.C. Cell 63:1205-1218(1990), 
s 1 2] Braun H.R. Emmermann M.. Mentzel H., Schmite U.K. Biochim. Biophys. Acta 1218:435-438(1994). 

[1 41 3] 5G0. Ribosonnal protein S5 signature 

Ribosomal protein S5 is one of the proteins from the snr>all ribosomal subunit. In Escherichia coli, S5 is known to be 
important in the assembly and function of the 30S ribosomal subunit. Mutations in S5 have been shown to increase 
10 transitional error frequencies. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities 
[1,2], groups: - Eubacterial S5. 

- Cyanelle S5. - Red alga! chloroplast S5. - Archaebacterial S5. 

- Mammalian S2 (LLrep3). - Caenorhabdrtis elegans S2 (C49H3. 11 ). 

IS - Drosophila S2. - Plant S2. - Yeast S4 {SUP44). - Fungi mitochondrial S5. 

55 is a protein of 166 to 254 amino^id residues. The signature pattern for this protein is based on a consented region, 
rich in glycine residues, and located in the N-terminal section of these proteins. 

20 - Consensus pattem: G4KRQpx(3)-[FYhx4ACV]-x(2)-[LIV^HUVM]-[AGHDNhx(2)-G-x-[LIVM]-G-x-(SAG]-x 
(5,6)-[DEQHUVMA]-x(2)-A-[UVMFl 

[ 1] All-Robyn J. A., Brown N., Otaka E., Liebman S.W. 

Mol. Cell. Biol. 10:6544-6553(1 990). ( 2] Otaka E., Hashimoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
25 [1414] 561. Ribosomal protein S6 signature 

Ribosomal protein S6 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S6 is known to bind 
together with SI 8 to 16S ribosomal RNA. It bekxigs to a family of ribosomal proteins whk:h, on the basis of sequence 
similarities, groups: - Eubacterial S6. - Red algal chbroplast S6. 

30 - Cyanelle S6. 

56 Is a protein of 95 to 208 amIno-ackJ reskiues. The signature pattem for this protein Is based on a consented region 
located in the N-tenminal section of these proteins. 

35 - Consensus pattem: G-x-[KRCHDENQRH]-L-[SA]-Y-x-I-[KRNSA] 

[1415] 562. Ribosomal protein S6e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

40 

- Mammalian S6 [1]. - Drosophila SB [2]. - Plant S6 [3]. - Yeast S10 (YS4). 

- Halobacterium marismortui HS13 [4]. - Methanococcus jannaschii MJ1260. S6 is the major substrate of protein 
kinases In eukaryotic ribosomes [5]; it may have an important role in controlling cell growth and proliferation through 
the selective translatk)n of partbular classes of mRNA. 

45 

These proteins have 135 to 249 amino acids. 

A conserved stretch of 12 residues in the N-terminal part of these proteins has been selected as a signature pattem. 

- Consensus pattem: [LI VMHSTAMR]-G-G-x-D-x(2)-G-x-P-M 

so 

1 1] Franco R.. RosenfeW M.G. J. Biol. Chem. 265:4321-4325(1990). 

[ 2] Watson K.L, Konrad K.D., Woods D.F, Bryant RJ. Proc. Natl. Acad. Sci. U.S.A. 89:11302-11306(1992). 
[ 3] Hansen G.. Estruch J. J., Spena A. Nuciek: Acids Res. 20:5230-5230(1 992). 
( 4] Kimura M.. Amdl E., Hatakeyama T, Hatakeyama T. Kimura J. Can. J. Mterobiol. 35:195-199(1989). 
ss I SI Bandi KR, Ferrari S.. Krieg J.. Meyer H.E.. Thomas G. J. Biol. Chem. 268:4530-4533(1 993). 

[1416] 563. Ribosomal protein S7 signature 

Ribosomal protein S7 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S7 is known to bind 
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directly to part of the 3'end of 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities [1,2.3], groups: - Eubactenal S7. 

Algal and plant chloroplast S7. - Cyanelle S7. - Archaebactertal S7. 
s - Plant mitochondrial S7. - Mammalian S5. - Plant S5. 

- Caenorhabditis elegans S5 (T05E 11.1). 

The best conserved region located in the N-terminal section of these proteins has been selected as a signature pattem. 

10 - Consensus pattem: [DENSK]-x4LIVMDEThx(3HLI>^FTA](2^x(6)-G-K4KR]-x(5)^LIViy!H-[LIVMFq^^ 
[STAC] 

[ 1] Klussmann S., Franke P. Bergmann U., Kostka S.. Wittmann-LieboW B. Biol. Chem. Hoppe-Seyler 374: 
305-312(1993). 

IS 1 2] Otaka E., Hashimoto T, Mizuta K. Protein Seq, Data Anal. 5:285-300(1 993). 

[ 3] Ignatcvch O.. Cooper M., Kulesza H.M., Beggs J.D. Nuciek: Ackte Res. 23:4616-4619(1995). 

[1417] 564. Ribosomal protein S7e signature 

[1418] A number of eukaryotk: ribosomal proteins can be grouped on the basis of sequence similarities [1]. One of 
^ these families consists of: 

Mammalian S7. 

- Xenopus SB. 
Insect 87. 

^5 - Yeast probable ribosomal protein S7 (N2212). 

- Fisskyi yeast probable ribosomal protein S7 (SpAC18G6.13c). 

These proteins have about 200 amino ackis. A highly consented stretch of 14 residues whk:h is located in the central 
sectk>n and whk:h is rich in charged residues was selected as a signature pattem. 
30 [1419] Consensus pattem: [KRl-L-x-R-E-L-E-K-K-F-[SAP]-x4KR]-H 

[1420] [1] Salazar C.E.. Mills-Hamm D.M., Kumar V., Collins FH. Nucleic Ackte Res. 21:4147-4147(1993). 
[1421] 565. Ribosomal protein 88 signature 

Ribosomal protein S8 is one of the proteins from the small ribosomal subunit. In Escherichia coli. S8 is known to bind 
directly to 16S ribosomal RNA. II bekxigs to a family of ribosonrial proteins which, on the basis of sequence similarities 
3S [1], groups: - Eubacterial S8. - Algal and plant chk>roplast S8. 

- Cyanelle 88. - Archaebacterial 88. - Marchantia poiymorpha mitochondrial S8. 

- Mammalian S15A. - Plant S15A. - Yeast 822 (824). 

40 The best conserved regwn located in the C^erminat section of these proteins has been selected as a signature pattem. 

- Consensus pattem: [GE]-x(2>iUy|(2)4STYHST|-x(2)-G4U VM](2)-x(4)-[AG]-[KRHAYI] 

[ 1] Otaka E., Hashimoto T, Mizuta K. 

^ Protein Seq. Data Anal. 5:285-300(1 993). 

[1422] 566. Ribosonial protein 88e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
[1]. One of these families consists of: 

so - Mammalian S8. - Caenorhabditis elegans 88 (F42C5.8). - Leishmania major 88. 
. Plant 88. - Yeast 88 (814) (Rp19). - Archebacterial S8e. 

These proteins have either about 220 amino acids (in eukaryotes) or about 125 amino ackte (in archebacteria). A 
consented stretch whk:h is located in the N4erminal sectbn and whk:h is rich in positively charged reskiues has been 
ss selected as a signature pattem. 

- Consensus pattem: [KR]-x(2)48T>G-(GA]-x(5)-[HR]-[KG}-[KR]-x-K-x-E-[U^]-G 
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[ 1] Engemann S.. Herfurth E., Briesemeister U., Wittmann-Liebold B. 

J. Protein Chem. 14:189-195(1995). 

[1423] 567. Ribosomal protein S9 signature 

Ribosomal protein S9 is one ot the proteins from the small ribosomal subunit. It belongs to a family of ribosomal proteins 
s which, on the basis of sequence similarities [1 .2], groups: - Eubacterial S9. - Algal chloroplast S9. 

- Cyanelle S9. - Archaebacterial S9. - Mammalian SI 6. - Plant SI 6. 

- Yeast mitochondrial ribosomal S9. 

10 A consented region containing many charged residues and located In the central section of these proteins has been 
selected as a signature pattem. 

- Consensus pattem: G-G-G-x(2HGSAI-Q-x(2HSAhx(3HGSAhx4GSTAyi-[KR]-[GSALHLIF] 

15 [ 1] Chan Y-L. Paz V. Oh/era J., Wool LG. FEES Lett. 263:85-88(1990). 

[ 2] Otaka E., Hashimoto T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1424] 568. Ribulose-phosphate 3-epjmerase family signatures 
20 Rlbulose-phosphate 3-epimerase (EC 5.1.3.1) (also known as pentose-5-phosphate 3-epimerase or PPE) Is the en- 
zyme that converts D-f ibu tose 5-phosphate into D-xylutose 5-phosphate in Calvin's reductive pentose phosphate cycle. 
In Ak:aligenes eutrophus two copies of the gene coding tor PPE are known [ 1 ], one is chromosomally encoded (cbbEC), 
the other one is on a plasmid (cbbeP). PPE has been found in a wide range of bacteria, archebacteria, fungi and plants. 
The sequence of PPE is highly related to: 

25 

Escherichia coli D-alluk)se^phosphate 3-epimerase (gene alsE). 
Escherichia coli protein sgcE. 

- Mycoplasma genitalium hypothetical protein MG112. 

30 All these proteins have from 209 to 241 amino acid residues. 

Two conserved regions which are kx:ated respectively in the N-terminal and in the central part of these proteins have 
been selected as signature patterns. 

- Consensus pattem: [U VMF]-H-[LIVMFY]-D-[LIVM]-x-D-x(1 ,2)4FY]-[LIVM]-x-N-x-[STAV] 
35 - Consensus pattem: [LIVMA]-x-[LIVM]-M-[STJ-[VS]-x-P-x(3)-G-Q-x-F-x(6)-(NK]-[LIVMC] 

[ 1] Kusian B., Yoo J.G., Bednarski R.. Bowien B. 
J. BacterwI. 174:7337-7344(1992). 

[1425] 569. (Ricin B lectin) Similarity to lectin domain of ricin beta-chain, 3 copies. 
40 [14201 This family consists of a triplicated domain involved in cell agglutinatkn in ricin. 
[1427] 570. (Rotamase) PpiC-type peptidyl-prolyl cis-lrans isomerase signature 

Peptidy^prolyl cis-trans isomerase (EC 5.2.1.8) (PPIase or rotamase) is an enzyme that accelerates protein fokJing 
by catalyzing thecis4rans tsonnerizatbn of proline imidk: peptide bonds in oligopeptkies [1 ]. Most characterized PPiases 
betong to two families, thecyclophilin-type (see<PCXX500154>)andthethe FKBP-type (see <PDOC00426>). Recently 
45 a third family has been discovered [2,3]. So far, the only bkxjhemcally characterized member of this family is the 
Escherichia coli protein panmlin (gene ppiC), a small (92 residues) cytoplasmic enzyme that prefers amino acid resi- 
dues with hydrophobic skie chains like leucine and phenylalanine in the PI posrtkxi of the peptkies substrates. PpiC 
is evolutbnary related to a number of proteins that are also probably PPiases: 

so - Escherichia coli and Haemophilus influenzae pplD. PpiD is a PPIase which contains a periplasms ppiC-like domain 
anchored to the inner membrane arKi which seems to be involved in the fokling of outer membrane proteins. 
Escherichia coli surA. SurA is a periplasms protein that contains two ppiC-like domains. 

- Nitrogennassimilating bacteria protein nifM whk:h is involved in the activation and stabilization of the iron-compo- 
nent (nif H) of nitrogenase. 

55 - Bacillus subtilis protein prsA. a membrane-bound lipoprotein involved in protein export. 

Lactococcus and lactobacillus protease maturation protein prtM, a membrane-bound lipoprotein involved in the 
maturatkxi of a secreted serine proteinase. - Yeast protein ESS1/PTF1 (processing/termlnatkxi factor 1). 

- Drosophila protein dodo (gene dod). - Mammalian protein PIN1 , 
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- Campybbacter jejuni cell binding factor 2 (CBF2). a secreted antigen. 

- Bacillus subtilis hypothetical protein yacD. 
Helicobacter pylori hypothetical protein HP0175. 
A hypothetical slime mold protein. 

5 

A conserved region that contains a serine which could play a role in the catalytic mechanism of these enzymes has 
been selected as a signature pattern. 

- Consensus pattern: F-[GSADEI]-x-(LVAQ)-A-x(3)-[ST]-x(3,4)-[STQ]-x(3,5HGER]-G-x-[LIVM]-[GS] 

10 

1 1] Fischer G., SchmidF.X 
Biochemistry 29:2205-2212(1990). 

[ 2] Rudd K.E.. Sofia H.J.. Koonin E.V, Plunkett G. III. Lazar S., 
Rouviere RE. Trends Biochem. Scl. 20:14-15(1995). 
IS 1 3] Rahfeld J -U., Ruecknagel K.R. Schelbert 8.. Ludwig B., Hacker J., 

Mann K.. Fischer G. FEBS Lett. 352:160-164(1994). 

[1428] 571 . (RmaAD) Ribosomal RNA adenine dimethylases signature 

A number of enzymes responsible for the dimethylation of adenosines if ribosomal RNAs (EG 2.1.1.46) have been 
20 found [1 ,2] to be evolutbnary related. These enzymes are: 

- Bacterial 1 6S rRNA dimethybse (gene ksgA), whteh acts in the biogenesis of ribosomes by catalyzing the dimeth- 
ylation of two adjacent adenosines in the loop of a consented hairpin near the 3'-end of 16S rRNA. Inactivatksn of 
ksgA leads to resistance to the aminoglycoside antibiotic kasugamycin. 

25 - Yeast 18S riRNA dimethylase (gene DIM1), which is functnnally similar to ksgA and that dimethylates twin ade- 
nosines in the 3'-end of IBS rRNA. 

- Bacterial 'erm' methylases. These enzymes confer resistance to macrdWe-lincosamide-streptogramin B (MLS) 
antibiottes - such as erythromycin - by dimethylating the adenine resrclue at positon 2058 of 23S rRNA thus resulting 
in a reduced affinity between ribosomes and the MLS antibiotics. 

30 - Gaenorhabditis elegans hypothetical protein E02H1 . 1 . 

The best consented regk>ns in these enzymes Is kxated in the N-terminal sectkxi and corresponds to a regkMi that is 
probably involved in S-adenosyl methbnine (SAM) binding. 

35 . Consensus pattern: [LIVM]-[LIVMFY]-IDE]-x-G-[STAPVJ-G-x-[GA]-x-[LIVMF]-[ST]-x(2)-[LIVMhx(6)-[LIV^ 
[STAGVJ-[LIVMFYHG]-E-x-D 

[ 1] van Gemen B., van Knippenberg RH. (In) Nucleic acid methylation, Clawson G.A., Willis D.B., Weissbach A., 
Jones RA.. Eds., pp.19-36, Alan R. Uss Inc. New-York. (1990). 
^ [ 2J Lafontalne D.. Deteour J.. Glasser A.L. Desgres J., \fendenhaute J. J. Mol. Biol. 241 :492-497(1994). 

[1 429] 572. (RuBisCO small) Ribulose bisphosphate carboxylase, small chain. 206 members 
[1 430] 573. ATP/GTP-binding site motif A (P-loop) (ras) 

From sequence comparisons and crystallographk: data analysis it has been shown [1 ,2,3,4,5.6] that an appreciable 
45 proportion of proteins that bind ATP or GTP share a number of more or less consen/ed sequence motifs. The best 
conserved of these motifs is a glycine-rich region, which typk:ally forms a flexible loop between a beta-strand and an 
alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is generally 
referred to as the 'A' consensus sequence [1] or the 'P-kDop* [5]. There are numerous ATP- or GTP-binding proteins in 
whk:h the P-kxjp is found. A number of protein families for which the relevance of the presence of such a motif has 
50 been noted are listed below: - ATP synthase alpha and beta subunits. - Myosin heavy chains. - Kinesin heavy chains 
and kinesin-like proteins. - Dynamins and dynamln-IIke proteins - Guanylate kinase - Thymidine kinase (- ThymkJyIate 
kinase. - Shikimate kinase. - Nitrogenase iron protein family (nifH/f ncC) - ATR-binding proteins involved in 'active trans- 
port* (ABC transporters) [7] - DNA and RNA helk:ases [8.9.10]. - GTP-binding elongation factors (EF-Tu, EF-1 alpha, 
EF-G. EF-2. etc.). - Ras family d GTP-binding proteins (Ras, Rho, Rab. Ral, Ypt1 , SEC4, etc.). - Nuclear protein ran. 
55 - ADP-ribosylatkxi factors family - Bacterial dnaA protein - Bacteral recA protein - Bacterial recF protein - Guanine 
nucleotkle-binding proteins alpha subunits (Gi. Gs. Gt, GO, etc.). - DNA mismatch repair proteins mutS family - Bacterial 
type II secretkxi system protein E. Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of 
proteins escape detectbn because the structure of their ATP-binding site is completely different from that of the P- 
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loop. Examples of such proteins are the E1 -E2 ATPases or the glycolytic kinases, in other ATP- or GTP-binding proteins 
the flexible loop exists in a slightly different form; this is the case for tubulins or protein kinases. A special mentkxi must 
be resented foradenylate kinase, in which there is a single deviatkxi from the P-kx)p pattern: in the last position Gly is 
found instead of Ser or Thr. 
s Consensus pattern: [AG]-x(4)-G-K-{ST| 

In addition to the proteins listed above, the 'A* motif is also found in a number of other proteins. Most of these proteins 
probably bind a nucleotide, but others are definitively not ATP- or GTP-binding (as for example chymotrypsin, or human 
ferritin light chain). 

[ 1] Walker J.E.. Saraste M., Runswick M.J., Gay N.J. EMBO J. 1:945-951(1982).[ 2] Moller W., Amons R. FEBS Lett. 

10 186:1-7(1985).[ 3] Fry D.C., Kuby S.A., MiWvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83:907-911 (1986).[ 4] Dever TE., 
GlyniasM.J-, Merrrck W.C. Proc. Natl. Acad. ScL U.S.A. 84:1814-1818(1 987).[ 5] Saraste M.. SibbaW PR., Wittinghofer 
A. Trends Bkx:hem. Sci. 1 5:430-434(1 990).[ 6] Koonin E.V. J. Mol. Bfol. 229: 11 65-1 174(1 993). [ 7] Higglns C.F, Hyde 
S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher M.P J, Bwenerg. Blomembr. 22:571 -592(1 990). [8] Hodgman T 
C. Nature 333:22-23(1988) and Nature 3^:578-578(1988) (Errata).[ 9] Linder P, Lasko P, Ashbumer M.. Leroy P. 

IS Nielsen PJ., Nishi K.. Schnier J., Skxiimski PP Nature 337: 121 -122(1 989). [10] Gorbalenya A.E., Koonin E.V.. 
Donchenko A.P., Blinov VM. Nucleic Ackte Res. 17:4713-4730(1989). 
[1431] GTP-binding nuclear protein ran signature (ras) 

Ran (or TC4) is a small abundant nuclear protein that binds and hydrolyzes GTP and whk:h has been implicated In a 
large number of processes including nucleocytoplasmic transport, RNA synthesis, processing and export and cell cycle 

20 checkpoint control [1,2]. Ran is generally included in the RAS 'superfamlly* of small GTP-binding proteins [3], but it is 
only slightly related to the other RAS proteins. It also differs from RAS proteins in that it lacks cysteine reskJues at its 
C- terminal and is therefore not subject to prenylation. Instead ran has an acidk: C-terminus. It is, however similar to 
RAS family members in requiring a specific guanine nucleotkJe exchange factor (GEF) and a specific GTPase activating 
protein (GAP) as stimulators of overall GTPase activity. The regon of the GTP-binding B motrf whk;h, in ran, is perfectly 

25 consen/ed has been selected as a signature pattem. Consensus pattern: D-T-A-G-Q-E-K-[LF]-G-G-L-R-[DE]-G-Y-Y- 
Proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A* (P-loop). 
[ 1] Scheffzek K., Klebe G.. Fritz-Wolf K., Kabsch W., Wittinghofer A. Nature 374:378-381 (1 995).[ 2] Rush M.G., Drivas 
G., rfEustachb P BfoEssays 18: 103-11 2(1 996).[ 3] >^lencia A., Chardin P, Wittinghofer A.. Sander C. Biochemistry 
30:4637-4648(1991). 

30 [1432] 574. recA signature 

The bacterial recA protein [1.2,3£1] is essential for homologous recombination and recombinational repair of DNA 
damage. RecA has many activities: it filaments, it binds to single- and double^tranded DNA, itbinds and hydrolyzes 
ATP, it is also a recombinase and, finally, it interacts with lexA causing its activatbn and leading to its autocatalytic 
cleavage. RecA is a protein of about 350 amino-acid reskJues. Its sequence Is very well consen/ed [3,4,5^E1J among 

3S eubacterial species. It Is also found in the chloroplast of plants [6]. The best consen/ed regbn, a nonapeptide kx:ated 
in the mkidle of the sequence whk:h is part of the monomernfnonomer interface in a recA filament has been selected 
as a signature pattern.. 

Consensus pattem: A-L-[KRHIFHFY]-[STA]-[STADHLIVMQ]-R- 

[ 1] Smith K.C., Wang T-C. V. BioEssays 1 0:1 2-1 6(1 989). [ 2] Lloyd A T, Sharp PM. J. Mol. Evol. 37:399-407(1993). 
40 [ 3] Roca A.I., Cox M.M. Prog. Nucleic Acids Res. Mol. Biol. 56:1 29-223(1 997). [ 4] Kariin S., Weinstock G.M., Brendel 
V. J. Bacterbl. 177:6881 -6893(1 995).[ 5] Eisen J.A. J. Mol. Evol. 41 :1 105-11 23(1 995).[ 6] Cerutti H.D., Osman M.. 
Grandoni P, Jagendorf A.T Proc. Natl Acad. Sci. U.S.A. 89:8068-8072(1 992).f El 1 http://www.tiQr.org/-^ieisen/RecA/ 
RecA.html 

[1 433] 575. Response regulator receiver domain 
45 This domain receives the signal from the sensor partner inComment: bacterial twcxxxnponent systems. It is usually 
found N-termlnalComment: to a DNA binding effector domain. 
[1] PaoGM. Saier MH; J Mol Evol 1995;40:136-154. 
[1434] 576. Ribonucleotkie reductase large subunit signature 

•Ribonucleotide reductase (EC 1.17.4.1 ) [1,2] catalyzes the reductive synthesis of deoxyribonucleotides from their 
so corresponding ribonudeotkJes. It provides the precursors necessary for DNA synthesis. Ribonucleotide reductase is 
an oligomerk: enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (300 to 400 residues). 
There are regions of similarities in the sequence dt the large chain from prokaryotes, eukaryotes and viruses. One of 
these regkxis has been selected as a signature pattem. 

[1435] Consensus pattem: W-x(2)-[LFhx(6,7)-G-[LIVM]-[FYRA]-(NH]-x(3)-(STAQLIVM]-[ASCl-x(2)-[PA]- 
ss 1 1] Nillson O.. Lundqvist T. Hahne S., Sjoberg B.-M. Bkwhem. Soc. Trans. 16:91 -94(1 988).[ 2] Reichard P Science 
260:1773-1777(1993). 

[1436] 577. Ribonuclease T2 family hlstkiine active sites 

The fungal ribonucleases T2 from Aspergillus oryzae, M from Aspergillus saitoiand Rh from Rhizopeus niveus are 
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structurally and functionally related 30 Kdglycoproteins [1] that cleave the S'-S' intemucleotide linkage of RNA via a 
nucleotide 2',3*-cycIic phosphate intermediates (EC 3 1.27.1 ^. A numl)er of other RNAses have been found to be evo- 
lutionary related to these fungal enzymes: - Setf-incompatibility [2] in flowering plants is often controlled by a single 
gene (S-gene) that has several alleles. Ttiis gene prevents fertilization by self-pollen or by pollen bearing either of the 

s two S- alleles expressed in the style. The self-incompatibility glycoprotein from several hi^er plants of the solanaceae 
family has been shown [2,3] to be a ribonuclease. - Phosphate-staroation induced RNAses LE and LX from tomato 
[4]. These two enzymes are probably involved in a phosphate-starvation rescue system. - Escherich'a coll periplasmic 
RNAse I (EC 31.27.6) (gene ma) [5]. - Aeromonas hydrophila periplasmic RNAse. - Haemophilus influenzae hypo- 
thetical protein HI0526,Two histidines residues have been shown [6,7] to be involved in the catalytic mechanism of 

10 RNase T2 and Rh. These residues and the region around them arehighly consen/ed in all the sequence described 
above. Two signature patterns have been developed, one for each of the two active-site histidines. The second pattern 
also contains a cysteine which is known to be involved in a disulfide bond. 
Consensus pattern; [FYWL]-x-[LIVM]-H-G-L-W-P [H is an active site residue] 

Consensus pattem: [LIVMF]-x(2HHDGTYHEQ]-{FYW]-x-[KR}-H-G-x-C [H is an active site residue] [C is involved in 
IS a disulfide bond] 

1 1] Watanabo H., Naitoh A., Suyama Y. Inokuchi N., Shimada H., Koyama T. Ohgi K.. trie M. J. Bkx:hem. 108:303-310 
(1990).[2] Haring V., Gray J.E., McClure B.A.. Anderson M.A., Clarke A.E. Science 250:937-941 (1990).[ 3] McClure 
B.A, Haring V, Ebert P.R., Anderson M.A.. Simpson R.J., Sakiyama R. Clarke A.E. Nature 342:95957(1 989).[ 4] Lo- 
effler A.. Glund K., Irie M. Eur. J. Biochem. 214:627-633(1 993). [ 5] Meador J. III. Kennell D. Gene 95: 1-7(1 990). [6] 
20 Kawata Y. Sakiyama R, Hayashi R. Kyogoku Y Eur. J. Biochem. 187:255-262(1 990).[ 7] Kurihara H., Mitsui Y. Ohgi 
K., Irie M., Mizuno H., Nakamura K.T FEBS Lett. 306:189-192(1992). 

[1437] 578. Ribonucleotide reductase large subunit signature. Ribonucleotide reductase (EC 1.17.4.1 ) [1 ,2] catalyz- 
es the reductive synthesis of deoxyribonucleotides from their corresporKling ribonucleotides. It provkJes the precursors 
necessary for DNA synthesis. Ribonucleotide reductase is an oiigomeric enzyme composed of a large subunit (700 to 

2S 1000 residues) and a small subunit (300 to 400 residues). There are regions of similarities in the sequence df the large 
chain from prokaryotes, eukaryotes and viruses. One of these regtons has been devetoped as a signature pattem. 
[1438] Consensus pattem: W-x(2)^LFJ-x(6.7)-G^LIV^fl-[FYRA]-[NH]-x(3)-[STAQLIVI^-[ASC]-x(2)-[P^^ 
[14391 [ 1] Nlllson O., Lundqvist T. Hahne S„ Sjoberg B.-M. Bkxhem. Soc. Trans. 16:91-94(1 988).[ 2] Refchard R 
Science 260:1773-1777(1993). 

30 [1440] 579. RNase H 

RNase H digests the RNA strand of an RNA/DNA hybrid. Important enzyme in retroviral repltcatkxi cycle, and often 
found as a domain associated with reverse transcriptases. Structure is a mixed alpha+beta fokl with three sMa layers. 
[1441] 58C. Eukaryotks putative RNA-binding region RNP-1 signature (rrm) 

Many eukaryotk: proteins that are known or supposed to bind sIngle-strandedRNA contain one or more copies of a 
35 putative RNA-binding domain of about 90amino acids [1,2]. This region has been found in the foltowing proteins: ** 
Heterogeneous nuclear ribonucleoproteins ** - hnRNP A1 (helix destabilizing protein) (twfce). - hnRNP A2/B1 (twice). 

- hnRNP C (CiyC2) (once). - hnRNP E (UP2) (at least once). - hnRNP G (once). ** Small nuclear ribonucleoproteins 
** - U1 snRNP 70 Kd (once). - U1 snRNP A (once). - U2 snRNP B* (once). ** Pre-RNA and mRNA associated proteins 
** - Protein synthesis initiatwn factor 4B (elF-4B) [3], a protein essential for the binding of mRNA to ribosomes (once). 

40 - Nucleolin (4 times). - Yeast single-stranded nucleic ackJ-binding protein (gene SSB1) (once). - Yeast protein NSR1 
(twKe). NSR1 is involved in pre-rRNA processing; it specifically binds nuclear kx»lization sequences, - Poly(A) binding 
protein (PABP) (4 times). ** Others ** - Drosophila sex determinatton protein Sex-lethal {Sx\) (twrce). - Drosophila sex 
delerminatkxi protein Transformer-2 (Tra-2) (once). - Drosophila 'elav* protein (3 times), which is probably involved in 
the RNA metabolism of neurons. - Human paraneoplastic encephalomyelitis antigen HuD (3 times) [4], which is highly 

45 similar to elav and whch may play a role in neuron-specifk: RNA processing. - Drosophila 'bk^id' protein (once) [5], a 
segment-polarity homeobox protein that may also bind to specific mRNAs. - La antigen (once), a protein which may 
play a role in the transcription of RNA polymerase III. - The 60 Kd Ro protein (once), a putative RNP complex protein. 

- A maize protein induced by abscisk: ackl in response to water stress, whkJh seems to be a RNA-binding protein. - 
Three tobacco proteins, kx^ted in the chk>roplast [6], which may be Involved In splicing and/or processing of chloroplast 

50 RNAs (twee). - XI 6 [7], a mammalian protein whrch may be Involved in RNA processing In relation with cellular pro- 
liferatkxi and/or maturatkxi. - Insulin-induced growth response protein CM from rat (twice). - Nucleolysins Tl A-1 and 
TIAR (3 times) [8] whrch possesses nudeolytk: activity against cytotoxk: lymphocyte target cells, may be involved in 
apoptosis. - Yeast RNA15 protein, whwh plays a role In mRNA stability and/or poly-(A) tail length [9].lnside the putative 
RNA-binding domain there are two regtons whteh are highly conserved. The first one is a hydrophobic segment of six 

55 reskiues (which is called the RNP-2 motif), the second one is an octapeptide motif (which is called RNP-1 or RNP- 
CS). The positkxt of both motifs In the ctomain is shown in the following schematk: representatbn: 
[1 442] xxxxxxx######xxxxxxxxxxxxxxxxxxxxxxxxxxxxx #Mffffff<W x» X RNP-2 RNP-1 

The RNP-1 motif has been used as a signature pattem for this type of domain. 
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Consensus pattern: [RKJ-G-{EDRKHPCG}-[AGSCIHFY]-[LIVA)-x-{FYLM] In most cases the residue in position 3 of 
the pattern is either Tyr or Phe. 

[ 1] Bandziulis RJ., Swanson M.S.. Dreyfuss G. Genes Dev. 3:431 -437(1 989). [ 2] Dreyfuss G., Swanson M.S., Pinol- 
Roma S. Trends Biochem. Sci. 1 3:86-91 (1988).[ 3] Milbum S.C., Hershey J.W.B., Davies M.V., Kelleher K., Kaufman 

S R J. EMBO J. 9:2783-2790(1 990).[ 4] Szalx) A.. Dalmau J., Manley G., Rosenfeld M., Wong E., Henson J., Posner J. 
B.. Fumeaux H.M. Cell 67:325-333(1 991 51 Rek^agliati M, Cell 58:231 -232(1 989). 161 U Y., Sugiura M. EMBO J. 9: 
3059-3066(1 990).[ 7] Ayane M., Preuss U., Koehler G., Nielsen PJ. Nucleic Acids Res. 19:1273-1278(1991). [8] 
Kawakami A.. Tian Q., Duan X., Streuli M., Schlossman S.F, Anderson P. Proc. Natl. Acad. Sci. U.S.A. 89:8681-8685 
(1992).(9] Minvielle-Sebastia L, Winsor B., Bonneaud N., Lacroute F Mol. Cell. Biol. 11:3075-3087(1991). 

10 [1 443] 58 1 . Rubredoxin signature 

[1444] Rubredoxins [1] are small electron-transfer prokaryotic proteins. They contain an iron atom which is llgated 
by four cysteine residues. Rubredoxins are. in some cases, functionally interchangeable with ferredoxins. 
[1445] A consen/ed region that includes two of the cysteine residues that bind the iron atom has been selected as 
a pattern for these proteins. 

IS [1 446] Consensus pattern: [LI VM]-x(3)-W-x-C-P-x-C-{AGD] [The two C*s bind the iron atom] 

In Pseudomonas oleovorans rubredoxin 2 (gene alkG) [2], this pattern is found twice because alkG has two rubredoxin 
domains. 

Rubrerythrin [3], a protein with inorganic pyrophosphatase activity from Desulfovibrio vulgaris possesses a C4erminal 
rubredoxin-like domain, but this domain is too divergent to be detected by the above pattern. 
20 [1447] [ 1] Berg J.M., Holm R.H.(ln) Iron-sulfur proteins, Spiro T.G., Ed., pp1-66. Wiley, New- York, (1982). [ 2] Kok 
M., OWenhuis R., der Linden M.PG., Meulenberg C.H.C., Kingma J., Witholt B., J. Btol. Chem. 264:5442-5451 (1989). 
[3] van Beeumen J.J.. van Driessche G., Liu M.-Y, Le Gall J., J. Biol. Chem. 266:20645-20653(1991 ). 
[1448] 582. (rwp) Eukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases. (EC 3.4.23.-) are a wkJely distributed family of proteolytic enzymes 

25 [1 ,2,3] known to exist invertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukary- 
otes are monomeric enzymes which consist of two domains. Each donriain contains an active site centered on a catalytic 
aspartyl residue. The two domains most probably evolved from the duplication of an ancestral gene encoding a pri- 
mordial domain. Currently known eukaryotk: aspartyl proteases are: - Vertebrate gastric pepsins A and C (also known 
as gastricsin). - VSertebrate chymosin (rennin). involved in digestkxi and used for making cheese. - Vertebrate lysosomal 

30 cathepsins D (EC 3.4.23.5) and E (EC 3.4.23.34) . - Mamnnalian renin (EC 3.4.23.15) whose functbn is to generate 
angiotensin I from angbtensinogen in the plasma. - Fungal proteases such as aspergilbpepsin A (EC 3.4.23.18) . 
candldapepsin (EC 3.4.23.24 ). mucoropepsin (EC 3.4.23.23) (mucor rennin), endothiapepsin (EC 3.4.23.22 ). polypo- 
ropepsin (EC 3.4.23.29) . and rhizopuspepsin (EC 3. 4.23.21 ). - Yeast saccharopepsin (EC 3.4.23.25 ) (proteinase A) 
(gene PEP4). PEP4 is implkated in posttranslational regulation of vacuolar hydrolases. - Yeast bamer pepsin (EC 

3S 3.4.23 35) (gene BAR1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone. 
- Fissbn yeast sxal whk^h is involved in degrading or processing the mating pheromones. Most retroviruses and some 
plant viruses, such as badnaviaises, encode for anaspartyl protease whk:h is an homodimer of a chain of about 95 to 
125 amino ackis. In most retroviruses, the protease is encoded as a segment of a polyprotein which is cleaved during 
the maturation process of the vims. It is generally part of the pol polyprotein and, more rarely, of the gagpolyprotein. 

40 Conservation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases altows us to devetop a single signature pattern for both groups of protease. 
Consensus pattern: [LIVMFGAC]-(UVMTADN]-[UVFSAJ-D-[ST]-G-[STAV1-[STAPDENQ]- x-[LIVMFSTNC]-x-[LlVMF- 
GTA] [D is the active site reskJueJ - [ 1] Foltmann B. Essays Biochem. 17:52-84(1 981). [2] Davies D R. Annu. Rev 
Btophys. Chem. 19:189-215(1 990). [ 3] Rao J.K.M., Erckson J.W., Wlodawer A. Bkwhemistry 30:4663-4671 (1991 ).[ 4] 

45 Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:105-120(1995). 

[1449] 583. (nrt) Reverse transcriptase (RNA-dependent DNA polymerase) 

A reverse transcriptase gene is usually indk:ative of a mobWe element such as a retrotransposon or retrovirus. Reverse 
transcriptases occur in a variety mobile elements, including retrotransposons, retroviruses, group It introns, bacterial 
msDNAs, hepadnavinises, and caulimoviruses. Number of members: 1233 
so [1450] [1] Medline: 91006031 . Origin and evolutkxi of retroelements based upon their reverse transcriptase sequenc- 
es. Xkxig Y, Eickbush TH; EMBO J 1990;9:3353-3362. 
[1451] 584. (S-AdoMet synt) S-adenosylmethionine synthetase signatures 

S-adenosylmethionine synthetase (EC 2.5.1.6) Is the enzyme that catalyzes theformation of S-adenosylmethionine 
(AdoMet) from methbnine and ATP [1]. AdoMet is an important methyl donor for transmethylatfon and is also the 
ss propylamine donor in polyamine bk>synthesis. In bacteria there is a single isofonn of AdoMet synthetase (gene metK). 
there are two in budding yeast (genes SAM1 and SAM2) and in mammals while in plants there is generally a multigene 
family.The sequence of AdoMet synthetase is highly consen/ed throughout isozymes and species. Two signature pat- 
terns have been selected for this type of enzyme; the first is a hexapeptkie which seems to be involved in ATP-blndIng; 
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the second is an almost perfectly conserved gtyclne-fich nonapeptide. 

Consensus pattern: G-A-G-D-Q-G-x(3)-GHFYHJ-Sequences known to belong to this class detected by the pattern: 
Consensus pattern: G-[GA]-G-[ASC]-F-S-x-K-[DE] 

[ 1] Horikawa S., Sasuga J., Shinr^izu K., Ozasa H.. Tsukada K. J. Biol. Chem. 265:13683-13686(1990). 

s [1452] 585. SI RNA binding domain 

The SI domain occurs in a wide range of RNAComment associated proteins. It is structurally similarComment: to coM 
shock protein which binds nucleic aclds.Comment: The SI domain has an OB-fokJ structure. 
[1] Bycroft M, Hubbard TJ, Proctor M. Freund SM. Murzin AG; Cell 1997;8B:235-242. 
[1453] 586. SAICAR synthetase signatures 

10 Phosphoribosylaminoimkiazole-succinocartx>xamide synthase (EC 6.3.2.6 ) (SAlCARsynthetase) catalyzes the sev- 
enth step in the de novo purine biosynthetic pathway; the ATP-dependent conversion of 5'-phosphoribosyl-5-aminoim- 
kJazole<4-carboxylic acid and aspartc ackl to SAICAR [1]. In bacteria (gene purC),fungi (gene ADE1) and plants, 
SAICAR synthetase is a monofunctwnal protein;in higher vertebrates it is the N-terminal domain of a bifunctional en- 
zyme that also catalyze phosphoribosylaminoimidazole carboxylase (AIRC) activity Two consen/ed regions in the 

IS central sectk)n of this enzyme have been selected as signature patterns for SAICAR synthetase. 
Consensus pattern: [LIVMF]{2)-P-ILIVM]-E-x-lUVMHLIVMCAJ-R-x(3)-rrA]-G-S- 
Consensus pattern: ILIVMHLIVMA]-0-x-K-(LIVMFY]-E-F^ 
[ 1] Zaikin H., Dixon J.E. Prog. Nucleb Acid Res. htol. Biol. 42:259-287(1992). 
[1454] 587. (SCP) Extracellular proteins SCP/Tpx-1/Ag5/PR-1/Sc7 signatures 

20 A variety of extracellular proteins from eukaryotes have been found to be evolutionary related: - Rodent sperm<x)ating 
glycoprotein (SCP), also known as acidk: epkiidymal glycoprotein (AEG) . This protein is thought to be involved in 
sperm maturatk)n [1]. It is a protein of about 220 residues and probably contains eight disulfide bonds. - Mammalian 
testis-specrfic protein Tpx-1 [2]. Tpx-1 is highly related to SCP's. - Mammalian glioma pathogenesis-related protein 
(GliPR). - Lizard helothermine, a toxin that blocks ryanodine receptors. - Nfenom allergen 5 (Ag5) from vespid wasps 

2S and venom allergen 3 (Ag3) from fire ants. These proteins are potent allergens and are the main cause of allergic 
reactkms to stings from insects of the hymenoptera family [3]. Ag5/3 are proteins of about 200 residues and contain 
four disulfide bonds. - Plant pathogenesis proteins of the PR-1 family [4]. These proteins are synthesized during path- 
ogen Infectkxi or other stress-related responses. They are proteins of about 1 30 to 1 40 residues and probably contain 
three disutfkle bonds. - Proteins Sc7 and Sc14 from the basidomycete fungus Schizophyllum commune. These extra- 

30 cellular proteins are loosely associated with f mit body hyphal walls [5]. Sc7/1 4 are proteins of about 1 80 reskJues and 
probably contain two disulfide bonds. - Ancytostoma secreted protein from dog hookworm. - Yeast hypothetkjal proteins 
Y JL078C, YJL079C and YKR01 3w.The exact functk)n of these proteins is not yet known. Two consented regk>ns kx:ated 
in their C-tenminal half have been selected as signature patterns. The second signature contains a cysteine whk:h is 
known to be involved in a disutfkle bond In Ag5. 

3S Consensus pattern: [GDER]-H-[FYWH}-T-Q-[UVM](2)-W-x(2)-[STN] 

Consensus pattern: [UVMFYH]-[LIVMFY]-x-C-[NQRHSl-Y-x-[PARHl-x-[GL]-N-[LIVMFYWDN] [C is involved in a di- 
sulfkie boncq 

[ 1] Mizuki N., Kasahara M. Mol. Cell. Endocrinol. 89:25-32(1992).! 2] Kasahara M.. Gutknecht J.. Brew K., Spurr N., 
Goodfeltow RN. Genomws 5:527-534(1 989). [ 3] Lu G., Villalba M., Coscia M R., Hoffman D.R., King TP J. Immunol. 
40 150:2823-2830(1 993). [4] Dixon D.C.. Cutl J.R., Klessig D.F. EMBO J. 10:1317-1324(1991 ).[ 5] Schuren F.H.J.. As- 
geirsdottir S.A., Kothe E.M.. Scheer J.M.J., Wessels J.G.H. J. Gen. MicroboL 139:2083-2090(1993). 
[1455] 588. SET domain 

SET domains appear to be protein-protein interactk)nComment: domains. It has been demonstrated that SET do- 
mainsComment: mediate interactbns with a family of proteins thatComment: display similarity with dual-specificity 
4S phosphatasesComment (dsPTPases) [2]. 

[1] Tripoulas N, LaJeunesse D, Giklea J, Sheam A; Genetics 1996; 143:91 3-928. [2] Cui X, De Vivo I, Slany R. Miyamoto 
A. Firestein R. Cleary, ML; fMat Genet 1998;18:331-337. 
[1 456] 589. Src homobgy 3 (SH3) domain profile 

The Src homobgy 3 (SH3) domain is a small protein domain of about 60 amino^ckJ residues first kJentified as a 
so consented sequence in the non-catalytc part of several cytoplasmic protein tyrosine kinases (e.g. Src, Abl, Lck) [1]. 
Since then, it has been found in a great variety of other intracellular or membrane-associated proteins [2,3,4,5].The 
SH3 domain has a characteristk: foM whk:h consists of five or six beta-strands arranged as two tightly packed anti- 
parallel beta sheets. The linker regky^s may contain short helices [6].The function of the SH3 domain is not well un- 
derstood. The current opinion is that they mediate assembly of specific protein complexes via binding to proline-rich 
ss peptides [7].ln general SH3 dorrains are found as single copies in a given protein, but there is a significant number of 
protein with two SH3 domains and a few with 3 or 4 copies. So far. SH3 domains have been Wentified in the following 
proteins: - Many vertebrate, invertebrate and retroviral cytoplasmk: (non-receptor) protein tyrosine kinases. In particular 
in the Src, Abl, Bkl. Csk and ZAP70 families of kinases. - Mammalian phosphatkJyIinositol-specifIc phospholipase C- 
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gamma-1 and -2. - Mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit. - Mammalian Ras GTPase- 
activating protein (GAP). - Adaptor proteins mediating binding of guanhe nucleotide exchange factors to growth factor 
receptors: vertebrate GRB2, CaerK)rhabditis elegans sem-5 and Drosophila DRK. All of which have two SH3 domains. 

- Mammalian \tev oncoprotein, a guanine nucleotide exchange factor of the CDC24 family. - Some guanine-nucleotide 
5 releasing factors of the CDC25 family: yeast CDC25. yeast SCD25, fission yeast ste6. - MAGUK proteins. These 

proteins consist of at least three types of domains: one or more copies of the DHR domain, a SH3 domain and a C- 
terminal guanylate kinase domain. Members of this family are: Drosophila lethal(1)discs large-1 tunK>r suppressor 
protein (gene Dlg1 ), mammalian tight junction protein ZO-1 , vertebrate erythrocyte membrane protein p55, Caenorhab- 
dit» elegans protein lin-2. rat protein CASK and mammalian synaptic proteins SAP90/PSD-95, CHAPSYN-11Q/PSD- 
10 93, SAP97/DLG1 and SAP102. - Miscellanous proteins interacting with vertebrate receptor protein tyrosine kinases: 
mammalian cytoplasmic protein Nek (3 copies), oncoprotein Crk (2 copies). - Chicken Src substrate p80/B5 protein 
(cortactin) and the similar human hemopoietic lineage cell specific protein Hsi . - Mammalian dihydrourdine-sensrtive 
L-type cateium channel beta (regubtory) subunit including the related human myasthenic syndrome antigen B (MSYB). 

- Mammalian neutrophil cytosolic activators of NADPH oxidase: p47 (NCF-1 ), p67 (NCF-2), and a potential homotog 
IS from Caenorhabditis elegans (80303.7). NCF-1 and -2 have two copies of the SH3 domain, while B0303.7 has four. - 

Some myosin heavy chains from amoebae, slime molds and yeast (gene MY03). - Vertebrate and Drosophila spectrin 
and fodrin alpha-chain. - Human amphiphysin. - Yeast actin-binding protein ABP1 . - Yeast actin-btndtng protein SLA1 
(3 copies). - Yeast protein BEM1 and the fission yeast homolog scd2 (or ral3) (2 copies). - Yeast BEM1 -binding proteins 
BOI2 (BEB1 ) and BOB1 (BOI1 ). - Yeast fusion protein FUS1 . - Yeast protein RSV1 67. - Yeast protein SSU81 . - Yeast 
20 hypothetical proteins YAR014C (1 copy), YFR024c (1 copy), YHL002w (1 copy). YHR016c (1 copy), YJL020C (1 copy), 
YHR114W (2 copies) and the fission yeast homok)g SpAC12C2.05c. - Caenorhabditis elegans hypothetical proteins 
F42H10.3. The profile developed to detect SH3 domains is based on a structural alignment consisting of 5 gap4ree 
blocks and 4 linker regions totaling 62 match positions. 

[ 1] Mayer B.J., Hamaguchi M., Hanafusa H. Nature 332:272-275(1 988).[ 2] Musacchio A., Gibson T, Lehto VP, Sar- 
zs aste M. FEBS Lett. 307:55-61(1 992). [ 3] Pawson T., Schlessinger J. Curr. Blol. 3:434^2(1 993). [ 4] Mayer B.J., Bal- 
timore D. Trends Cell Biol. 3:8-13(1993).[5] Pawson T Nature 373:573-580(1 995).[ 6] Kurlyan J.. Cowbum D, Cun^. 
Opin. Struct. Biol. 3:828^37(1 993).[ 7] Morton C.J.. Campbell I D. Curr. Biol. 4:615-617(1994). 
[1457] 590. Serine hydroxymethyltransferase pyridoxal-phosphate attachment site (SHMT) 
Serine hydroxymethyltransferase (EC 2.1.2.1 ) (SHMT) [1] catalyzes the transfer of the hydroxymethyl group of serine 
30 to tetrahydrofolate to form 5,10-methylenetetrahydrofolate and glycine. In vertebrates, it exists in acytoplasmic and a 
mitochondrial form whereas only one form is found in prokaryotes. Serine hydroxymethyltransferase is a pyridoxal- 
phosphate containing enzyme. The pyndoxal-P group is attached to a lysine residue around which the sequence is 
highly conserved in all forms of the enzyme. 

Consensus pattem: [DEHHLIVMFY]-x-[STMVI-[GST|-[ST|(2)-H-K-[ST|-[LF}-x-G-{PAC]-[RQ]-[GSAHGAl [K is the py- 
3S ridoxal-P attachment site] 

[ 1] Usha R.. Savithri H.S., Rao N.A. Biochim. Biophys. Acta 1204:75-83(1994). 
[145q 591. SIS domain 

SIS (Sugar ISomerase) domains are found in many phosphosugar isomerases and phosphosugar binding proteins. 
[1] Teplyakov A, Obmotova G, Badet-Dentsot MA. Badet B, Polikarpov I; Structure 1998;6:1047-1055. 

40 [1 459] 592. (SKI) Shikimate kinase signature 

Shikimate kinase (EC 2.7.1.71) catalyzes the fifth step in the bwsynthesis from chorismate of the aromatic amino ackJs 
(the shikimate pathway) Inbacteria (gene aroK or aroL), plants and in fungi (where it is part of a multifunctional enzyme 
which catalyzes five consecutive steps In this pathway).Shikimate kinase is a small protein of about 200 residues. A 
consen/ed region that contains a nin of three glycines has been selected as a signature pattem. 

45 Consensus pattem: IKR]-x(2)-E-x(3)-[UVMF]-x(8,12)-[UVMF](2)-[SA]-x-G(3)- x-[LIVMF]. Proteins belonging to this 
family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop). 
[1 460] 593. SNAP-25 family 

[1461] SNAP-25 (synaptosome-associated protein 25 kDa) proteins are components of SNARE complexes. Mem- 
bers <rf this family contain a cluster off cysteine reskJues that can be palmitoylated for membrane attachment [2). 
so [1462] [1]Brennwald P, Keams B, Champkxi K. Keranen S. Bankaitis V Novick P; Cell 1994;79:245-258. [2] Risinger 
C. Btomqvist AG. Lundell I, l^mbertsson A. Nassel D, Pieribone VA, Brodin L, Larhammar D; J Biol Chem 1993;268: 
24408-24414. 

[1463] 5d4. SNF2 and others N-termtnal domain 

[1464] This domain Is found in proteins involved in a variety of processes including transcription regulation (e.g., 
ss SNF2, STH1, brahma, MOT1). DNA repair (e.g.. ERCC6, RAD16, RAD5), DNA recombination (e.g.. RAD54). and 
chromatin unwinding (e.g., ISWI) as well as a variety of other proteins with little functtonal information (e.g.. kxiestar, 
ETL1). 

[1 465| 595. Staphykxxxx:al nuclease homologues (Snase) 
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Present in all three domains of cellular life. Four copies in the transcriptional coactivator pi 00. These, however, appear 
to lack the active site residues of Staphylococcal nuclease. Positions 14 (Asp-21 ), 34 (Arg-35), 39 (Asp-40). 42 (Glu- 
43) andCtomment: 110 (Arg-87) [SNase numbering In parentheses] are thought to be involved in substrate-binding and 

catalysis. 

s [1] Ponting CP; Protein Sci 1997;6:459-463. [2] Callebaut I. Mornon JP; Biochem J 1997;321 : 125-1 32. 
[1466] 596. SPRY domainA 

SPRY Domain is named from SPIa and the RYanodine Receptor. Domain of unknown functbn. Distant homologues 
are domains in Comment: butyrophilin/marenostrin/pyrin homotogues. 
[1] Ponting C. Schultz J. Bork P; Trends Biochem Sci 1997;22:193-194. 

10 [1467] 597. (SOS PSY) Squalene and phytoene synthases signatures 

Two different polylsoprene synthases have been shown (1 ,2,3] to share a number of regions of sequence similarities: 
- Squalene synthase (EC 2.5.1.21) (tamesyl-diphosphate famesyltransferase) (SOS), which catalyzes the conversion 
of two molecules of farnesy I diphosphate (FPP) into squalene. It is the first committed step in the cholesterol biosy nthetic 
pathway. The reactran carried out by SQS is catalyzed in two separate steps: the first is a head-to-head condensatk)n 

IS of the two nnolecules of FPP to form presqualene diphosphate; this intermediate is then rearranged in a NADP-de- 
pendent reduction, to form squalene. SQS is found in eukaryotes. In yeast it is encoded by the ERG9 gene, in mammals 
by the FDFT1 gene. SQS seems to be membrane-bound. - Phytoene synthase (EC 2.5.1 .-) (PSY), which catalyzes 
the conversion of two molecules of geranylgeranyl diphosphate (GGPP) into phytoene. It is the second step in the 
biosynthesis of carotenoids from isopentenyl diphosphate. The reaction carried out by PSY is catalyzed in two separate 

20 steps: the first is a head^o-head corKiensation of the two molecules of GGPP to fonm prephytoene diphosphate; this 
intermediate is then rearranged to form phytoene. PSY is found in all organisms that synthesize carotenoids: plants 
and photosynthetk: bacteria as well as some non- photosynthetic bacteria and fungi. In bacteria PSY is encoded by 
the gene crtB. In plants PSY is localized in the chloroplast. As it can be seen from the description above, both SQS 
and PSY share a number of functional similarities which are also reflected at the level of their primary structure. In 

25 particular three well conserved regksns are shared bySQS and PSY; they could be involved in substrate binding and/ 
or the catalytic mechanism. Signature patterns have been developed for the second and third consented regions; they 
are k)calized in the central part of these enzymes. 

Consensus pattem: Y-[CSAM]-x(2)-[VSG]-A-[GSA]-[LIVAT|-[IV]-G-x(2)-[LMSC]- x(2)-[LI V] 
Consensus pattem: [LIVrwiJ-G-x(3)-Q-x(2,3)-N-IIF]-x-R-D-[LIVMFY]-x(2)-[DE]- x(4,7)-R-x-[FY]-x-P- 
30 1 1] Summers C, Karst F, Charles A.D. Gene 136:185-1 92(1 993). [ 2] Robinson G.W., Tsay YH., Klenzle B.K., Smith- 
Monroy C.A., Bishop R.W. Mol. Cell. Biol. 13:2706-2727(1 993). [ 3] Roemer S.. Hugueney P, Bouvier F. Camara B.. 
Kuntz M. Biochem. Biophys. Res. Commun. 196:1414-1421(1993). 
[1468] 598. SRP54-type proteins GTP-binding domain signature 

The signal recognition particle (SRP) is an oligomeric complex that mediates targeting and . insertion of the signal 

3S sequence of exported proteins into the membrane of the endoplasmic reticulum. SRP consists of a 7S RNA and six 
protein subunits. One of these subunits, the 54 Kd protein (SRP54), is a GTP-binding protein that interacts with the 
signal sequence when it emerges from the ribosome. The N4erminat 300 reskiues of SRP54 include the GTP-binding 
site (G-domain) and are evotutkxiary related to similar domains in other proteins which are listed below [1 ]. - Escherichia 
coli and Bacillus subtilis ffh protein (P48). a protein which seenns to be the prokaryotic counterpart of SRP54. Ffh is 

40 associated with a 4.5S RNA in the prokaryotic SRP complex. - Signal recogniton particle receptor alpha subunit (dock- 
ing protein), an integral membrane GTP-binding protein which ensures, in conjunctkxi with SRR the correct targeting 
of nascent secretory proteins to the erutoptasnmc retk:ulum membrane. The G-domain is located at the C-terminal 
extremity of the protein. - Bacterial ftsY protein, a protein whk:h is believed to play a similar role to that of the docking 
protein in eukaryotes. The G-domaIn is tocated at the C-terminal extremity of the protein. - The pilA protein from Neis- 

45 seria gonorrhoeae whk:h seems to be the homok>g of ftsY. - A protein from the archaebacteria Sulfotobus solfataricus. 
This protein is also believed to be a docking protein. The G-domain is also at the C- terminus. - Bacterial flagellar 
biosynthesis protein flhF. The best consented regkxis in those domains are the sequence motifs that are part of the 
GTP-binding site, but as those regk)ns are not specific to these prc^eins, they were not used as a signature pattem. 
Instead, a conserved regk)n located at the C-terminal end of the domain was selected. 

so Consensus pattem: P4LIVMl-x-[FYLh[LIVMATHGS]-x-[GS]-[EQ]-x(4)-[LIVf^F] [ 1] Allhoff S.. Selinger D., Wise J.A. 
Nucleic Acids Res. 22:1933-1947(1994). 

[1469] 599. (STphosphatase) Serine/threonine specific protein phosphatases signature 

Serine/threonine specifk: protein phosphatases (EC 3.1.3.16) (PP) [1,2,3] are enzymes that catalyze the removal of a 
phosphate group attached to a serane or evoluttonary related. - Protein phosphatase-1 (PPI ) is an enzyme of broad 
ss specificity. It is inhibited by two thermostable proteins, inhibitor-1 and -2. In mammals, there are two closely related 
isoforms of PP-1 : PP-1 alpha and PP-lbeta, produced by altemative splicing of the same gene. In Emericella nidulans, 
PP-1 (gene bimG) plays an important role in mitosis control by reversing the action of the nimA kinase. In yeast. PP- 
1 (gene SIT4) is involved in dephosphorylating the large -subunit of RNA polymerase II. - Protein phosphatase-2A 
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(PP2A) is also an enzyme of broad specificity. PP2A is a trimeric enzyme that consist of a core composed of a catalytic 
subunit associated with a 65 Kd regulatory subunit and a third variable subunit. In mammals, there are two closefy 
related isoforms of the catalytic subunit of PP2A: PP2A-alpha and PP2A-beta, encoded by separate genes. - Protein 
phosphatase-2B (PP2B or calcineurin), a calcium-dependent enzyme whose activity is stimulated by calmodulin. It is 

s composed of two subunits: the catalytic A-subunit and the calciunrvbinding B-subunit. The specificity of PP2B is re- 
strictedJn addition to the above-mentioned enzymes, some additional serine/lhreoninespecific protein phosphatases 
have been characterized and are listed below. - Mammalian phosphatase-X (PP-X). and Drosophila phosphatase-V 
(PP-V) which are closely related but yet distinct from PP2A. - Yeast phosphatase PPH3. which is similar to PP2A, but 
with different enzymatic properties. - Drosophila phosphatase-Y (PP-Y), and yeast phosphatases Z1 and 22 (genes 

10 PPZ1 and PPZ2) which are closely related but yet distinct from PP1 . - Drosophila retinal degeneration protein C (gene 
rdgC), a calcium-binding phosphatase required to prevent light-induced retinal degeneration. - Phages Lambda and 
Phi-80 ORF-221 which have been shown to have phosphatase activity and are related to mammalian PP's. The best 
consented regions in these proteins is a perfectly conserved pentapeptide that can be used as a signature pattern. 
Consensus pattern: [LIVMJ-R-G-N-H-E- 

is [ 1] Cohen R Annu. Rev. Biochem. 58:453-508(1 989).( 2) Cohen R, Cohen RT.W. J. Biol. Chem. 264:21435-21438 
(1989).[ 3] Cohen RT.W., Brewis N.D., Hughes V., Mann D.J. FEBS Lett. 268:355-359(1990). 
[1470] 600. Translation initiation factor SUI1 signature 

In budding yeast (Saocharomyces cerevisiae), SUI1 is a translation initiatbn factor that functions in concert with elF- 
2 and the initiator tPNA-Met in directing the ribosome to the proper start site of translation [1]. SUM is a protein of 108 
20 residues. Close homologs of SUI1 have been found [2] in mammals, insects and plants. SUM is also evolutionary 
related to hypothetical proteins from Escherichia coli (yciH), Haemophilus influenzae (H1 1225) and Methanococcus 
vannielii. A conserved region in the C-terminal section has been selected as a signature pattern. 
Consensus pattern: [LIVMHEQ]-(UVM]-Q-G-[DENl-fKHQHKRV] 

[ 1] Yoon H.. Donahue TR Mol. Cell. Biol. 1 2:248-260(1 992).[ 2) Fields C.A., Adams M.D. Biochem. Biophys. Res. 

2S Commun. 1 98:288-291 (1 994). 

[1471] 601 . (S T dehydratase) Serine/threonine dehydratases pyridoxal-phosphate attachnrjent site 
Serine and threonine dehydratases [1,21 functionally and stmcturally related pyridoxaKphosphate dependent en- 
zymes: - L-serine dehydratase (EC 4.2.1.13 ) and D-serine dehydratase (EC 4.2.1.14 ) catalyze the dehydratatbn of L- 
serine (respectively D-serine) into ammonia and pyruvate. - Threonine dehydratase (EC 4.2.1.16 ) (TDH) catalyzes the 

30 dehydratation of threonine into alpha-ketobutarate and ammonia. In Escherichia coli and other microorganisms, two 
classes of TDH are known to exist. One is involved in the biosynthesis of isoleucine, the other in hydroxamino acid 
catabollsm-Threonlne synthase (EC 4.2.99.2 ) is also a pyridoxal-phosphate enzyme, It catalyzes the transformation 
of homoserine-phosphate into threonine. It has been shown [3J that threonine synthase is distantly related to the serine/ 
threonine dehydratases. In all these enzymes, the pyridoxal-phosphate group is attached to a lysine residue. The 

35 sequence around this residue is sufficiently conserved to allow the derrvatk)n of a pattern specific to serine/threonine 
dehydratases and threonine synthases. 

Consensus pattem: [DESH]-x(4,5)-[STVGhx-[ASHFYIl-K-[DLIFSA]-[RVMFHGAHLIVMGA] [The K is the pyridoxal-P 
attachment site] 

[ 1] Ogawa H., Gomi T, Konishi K.. Date T, Naakashima H., Hose K., Matsuda Y, Peraino C, Pltot H.C.. Fujioka M. 
40 J. Biol. Chem. 264: 1581 8-1 5823(1 989). [ 2] Datta R, Goss TJ.. Omnaas J.R., Patil R.V. Proc. Natl. Acad. Sci. U.S.A. 
e4:393-397(1987).( 3] Parsot C. EMBO J. 5:301 3-301 9(1 986). ( 4] Grabowski R., Hofmeister A.E.M.. Buckel W. Trends 
Bkxhem. Sci. 18:297-300(1993). 

[1 472] Cysteine synthase/cystathbnine beta-synthase P-phosphate attachment site 

Cysteine synthase (CSase) is the pyridoxal-phosphate dependent enzyme responsible [1 ] for the formation of cysteine 
45 from O-acetyl-serine and hydrogen sulfide with the concomitant release of acetic acid. In bacteria suchas Escherichia 
coli, two fomns of the enzyme are known (genes cysK and cysM).ln plants there are also two fomns, one located in the 
cytoplasm and the otherin chtoroplasts.Cystathkxiine beta-synthase [2] catalyzes the first irreversiblestep in homo- 
cysteine transulfuration; the conjugation of homocysteine andserine forming cystathkxiine. Like Csase it is a pyridoxal- 
phosphate dependent enzyme. The two types of enzymes are evolutk)nary related. The pyridoxal-phosphategroup of 
so CSases has been shown to be attached to a lysine residue which is located in the N-terminal sectbn of these enzymes; 
the sequence around this residue is highly consented and can be used as a signature pattern to detect this class of 
enzymes. 

Consensus pattem: K-x-E-x(3)-[PA]-[STAGC]-x-S-[IVAP]-K-x-R-x-(STAG]-x(2)-[LIVM) [The 2nd K is the pyridoxal-P 
attachment site 

ss 1 1] Saito K., Kurosawa M., Murakoshi I. FEBS Lett. 328:111-114{1993).[ 2] Swaroop M., Bradley K., Ohura T, Tahara 
T, Roper M.D., Rosenberg LE., Kraus J.R J. BoL Chem. 267:11455-11461(1992). 
[1473] 602. S kxus glycop 

S-locus glycoprotein family. In Brassicaceae, self-incompatible plants have a self/hon-self Comment: recognitfon sys- 
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tern. This is spofophyticaliy controlled by Comment: muttiple alleles at a single bcus (S). S4ocus glycoproteins. Com- 
ment: as well as S-receptor kinases, are in linkage with the S-alleles [1]. Number of members: 128 
[1] Evolutionary aspects of the S-related genes of the Brassica self-incompatibilrty system: synonymous and nonsyn- 
onynrK)us base subslitutkxis. Hinata K. Vtotanabe M. Yamakawa S. Satta Y, Isogai A; Genetics 1995;140:1099-1104. 
5 [2] Polymorphism of the S-locus glycoprotein gene (SLG) and the S4ocus related gene (SLR1) in Raphanus sativus 
L and self-incompatible ornamental plants in the Brassicaceae. Sakamoto K, Kusaba M, Nishio T; Mol Gen Genet 
1998;258:397-403. 

[1474] 603. (sdh cyt) Succinate dehydrogenase cytochrome b subunit signatures 

Succinate dehydrogenase (SDH) is a membrane-bound complex of two wain components: a membrane-extrinsic com- 
10 ponent composed of an FAD-binding flavoprotein and an iron-sulfur protein, and a hydrophobic component composed 
of a cytochrome B and a membrane anchor protein. The cytochrome b component- is a mono heme transmembrane 
protein (1.2,3] betonging to a family that groups: - Cytochrome b-556 from bacterial SDH (gene sdhC). - Cytochrome 
b560 from the mammalian mitochondrial SDH complex. - Cytochrome b560 subunrt encoded In the mitochondrial ge- 
nome of some algae and in the plant Marchantia polymorpha. - Cytochrome b from yeast mitochondrial SDH complex 
IS (gene SDH3 or CYB3). - Protein cyt-1 from Caenorhabdilis.These cytochromes are proteins of about 1 30 residues that 
comprise threetransmembrane regions. There are two consen/ed histkiines whk:h may beinvolved In binding the heme 
group. Two signature patterns have been devek)ped that Include these histkiine residues. 
Consensus pattem: R-P-ILIVMT]-x(3)-[LIVM]-x(6HLIVMWPKhx(4)-S-x(2)-H-R-x-[ST| [H could be a heme ligand] 
Consensus pattem: H-x(3)-[GA]-[LIVMT]-R-[HFHLIVMF>x-[FYWM]-D-x-{6VA] [H couW be a heme ligand) 
20 [ 1] Yu L. Wei Y-Y. Usui S.. Yu C.-A. J. Biol. Chem. 267:24508-2451 5(1 992). ( 2] Abraham RR.. Mukter A., \fan1 Riet 
J.. Haue H.A. Mol. Gen. Genet. 242:708-716(1994).! 

3] Leblanc C. Boyen C. Rkrfiard O.. Bonnard G.. Grienenberger J.M.. Kloareg B, J. Mol. Biol. 250:484-495(1995). 
[14751 504. Seel family 

[1] The Seel family: a novel family of proteins involved in synaptic transmission and general secretion. Halachmi N. 
5S Lev Z; J Neurochem 1 996;66:889-897. 
Number of members: 40 

[14761 605. Protein secE/sec61 -gamma signature 

In bacteria, the secE protein plays a role in protein export; it e one of the components - with secY and secA - of the 
preprotein translocase. In eukaryotes, the evolutionary related protein sec61 -gamma playsa role in protein translocation 

30 through the endoplasmic reticulum; it Is part of a trimeric complex that also consist of sec61 -alpha and beta [1]. Both 
secE and sec61 -gamma are small proteins of about 60 to 90 amino acids that contain a single transmembrane region 
at their C-terminal extremity (Escherichia collsecE is an exception, in that it possess an extra N-terminal segment of 
60residues that contains two additional transmembrane domains).The sequence of secE/sec6 1 -gamma is not extreme- 
ly well consented, however it is possible to derive a signature pattem centered on a consented proline located 10 

3S residues before the beginning of the transmembrane domain. 

Consensus pattem: IUVMFY]-x(2)-[DENQGAhx(4)-[UVMFTA]-x-[KR>^-x(2)-[KVVhP-x(3)-[SEQ]-x(7)-[LIVT]-I 
[LIVFGASTI 

[ 1 J Hartmann E.. Sommer T. Prehn S.. Goerlfch D., Jentsch S., Rapoport TA. Nature 367:654-657(1 994). 
[1 477] 606. 1 1 -S plant seed storage proteins signature 
40 Plant seed storage proteins, whose principal function appears to be the major nitrogen source for the developing plant, 
can be classified, on the basis of their structure, into different families. 11-S are non-glycosylated proteins which form 
hexamerk: structures [1.2]. Each of the subunits In the hexamer is itself composed of an ackik: and a basic chain 
derived from a single precursor and linked by a disutfkie bond. This structure is shown in the folk>wing representation 
H h II 

45 xxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxNGxCxxxxxxxxxxxxxxxxxxxxxxx < ^Acidrc-subunit >< — 

Basc-subunit > < ^About-480-to-500-reskiues >'C': consen/ed cysteine involved in a di- 
sulfide bond.***: position of the pattem. Proteins that belong to the 1 1 -S family are: pea and broad bean legumins, rape 
crudferin. rice glutelins. cotton beta-globulins, soybean gtycinins. pumpkin 11-S gk)bulin. oat gk)bulin. sunflower hell- 
anthtntn G3. etc. The region that includes the conserved cleavage site between the ackJk: and basic subunits (Asn- 

50 Gly) and a proximal cysteine reskiue whfch Is involved In the Interchain disulfkie bond have been used as a signature 
pattem for this family of proteins. 

Consensus pattem: N-G-x-[DE](2)-x-[LIVMF]-C-IST>x(11,12)-[PAG]-D [C is involved in a disulfkJe bond 
[ 1 ] Hayashi M.. Mori H.. Nishimura M.. Akazawa T. Hara-Nishimura I. Eur. J. Biochem. 1 72:627-632(1 988).[ 2] Shotwell 
M.A., AfonsoC. Davies E.. Chesnut R.S.. Larklns B.A. Plant Physbl. 87:698-704(1988). 
ss [1478] 607. 7S seed storage protein 

[1479] 7S gtobulin is one of the main storage proteins of most angiosperms and gymnosperms. The 7S storage 
proteins are honrK>trimers. 
Number of members: 67 
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[1480] [1] The mree-dimenslonal structure of canavalin from jack bean (Canavalia ensiformis). KoTR Ng JD, McPher- 
son A; Plant Physfol 1993;101:729-744. 

[1481] 608. Aspartate-semiakJehyde dehydrogenase signature 

Aspartate-semialdehyde dehydrogenase (ASD) catalyzes the second step in the comnfK)n biosynthetic pathway leading 
5 from Asp to diaminopimelate arKJ Lys. to Met, and to Thr; the NADP-dependent reductive dephosphorylatlon of L- 
aspartyl phosphate to L-aspartate-semialdehyde. In bacteria and fungi, ASDis a protein of about 40 Kd (340 to 370 
residues) whose sequence is not extremely well consenfed [1]. A consented cysteine residue has been implicated as 
important tor the catalytic activity [2].The region of conservation around the active site residue Is too small to be used 
as signature pattern. Another more consented region, located in the last third of the sequence, and which contains 
10 both a consented cysteine as well as an histidtne has been used instead. 
Consensus pattern: [LIVMHSADNhx(2)-C-x-R-[UVf\/ll-x{4)-[GSC]-H-[STA 

[ IJ Baril C, Richaud C, Foumi E.. Baranton G.. Saint GIrons I. J. Gen. Microbiol. 138:47-53(1 992). [2] Karsten W.E., 
Viola R.E. Biochim. Biophys. Acta 1121:234-238(1992). 

[1482] N-acetyl-gamma-glutamyl-phosphate reductase active site N-acetyl-gamma-glutamyl-phosphate reductase 
IS (EC 1.2.1.38 ) (AGPR) [1 ,2] is the enzyme that catalyzes the third step in the biosynthesis of arginine from glutamate. 
the NADP-dependent reduction of N-acetyl-5-glutamyl phosphate into N-acetylglutamate 5-semialdehyde.ln bacteria 
it is a monof unctional protein of 35 to 38 Kd (gene argC) while in fungi it is part of a bif unctional mitochondrial enzyme 
(gene ARG5.6, argil orarg-6) which contains a N-terminal acetylglutamate kinase (EC 2.7.2.8) domain and a C-termlnal 
AGPR domain. In the Escherbhia coli enzyme, a cysteine has been shown to be implicated in the catalytic activity, the 
20 region around this reskiue is well conserved and can be used as a signature pattem. 

Consensus pattem: [LIVMHGSA]-x-P-G-C-[FYl-{AVP]-T-[GA]-x(3)-{GTACHLI VM]-x-P [C is the active site resklue] 
[ IJ Ludovbe M., Martin J.F.. Carrachas R. Uras R J. Bacleriol. 174:4606-461 3(1 992). [2] Gessert S.F.. Kim J.H.. 
Nargang F.E., Weiss R.L. J. Bk>f, Chem. 269:8189^203(1994). 
[1483] 609. Sialyltransferase family, 
2S Numljer of members: 18 

[1484] 61 0. SpoU rRNA Methylase family 

This family of proteins probably use S-AdoMet. Nunnber of members: 58 

[1] SpoU protein of Escherichia coli belongs to a newfamily of putative rRNA methylases. Koonin EV, Rudd KE; Nucleic 
Acids Res 1 993;21 :551 9-551 9. [2] The spoU gene of escherichia coli , the fourth gene of the spoT operon, is essential 
30 for tRNA (Gm18) 2 * methyltransf erase activity. Persson BC. Jager G, Gustafsson C; Nuciek: Acids Res 1997;25: 
4093^97. 

[1 485] 61 1 . Stathmin family signatures 

Stathmin [1] (from the Greek 'stathmos'which means relay), is an ubiquitous intracellular protein, present in a variety 
of phosphorylated forms and whfch serves as a relay tor diverse second messenger pathways. Its expression and 

35 phosphorylation are regulated throughout devetopment and in response to extracellular signals regulating cell prolif- 
eration, differentiation artd function. Stathmin is a highly conserved protein of 149 amino acid residues. Structurally, it 
consists of an N-termmal domain of about 45 reskiues folk»¥ed by a 78 reskiue alpha-helk^al domain consisting of a 
heptad repeat coiled coil structure and a C-terminal domain of 25 residues. Protein SCG 1 0 is a neuron-specific, mem- 
brane-associated protein that accumulates in the growth cones of developing neurons. It Is highly similar in its sequence 

40 to stathmin, but differs in that it contains an additkxial N-terminal hydrophobe segment df 32 resklues whbh is probably 
responsible for its interaction with membranes. Xenopus protein XB3 is also evolutk)nary related to stathmin and also 
contains an additkxial N-terminal hydrophobk: domain [2]. A consented decapeptkie whk:h ends with the first three 
resklues of the coiled coil domain and a second pattem that corresponds to part of the central region of the coiled coil 
have been selected as signatures for proteins of the stathmin family. 

45 Consensus pattem: P-IKRQ]-[KR](2)4DE]-x-S-L-[EG]-E- 
Consensus pattem: A-E-K-R-E-H-E-[KR]-E- 

[1] Sobel A Trends Biochem. Sci. 16:301 -305(1 991 ).[ 2] Maucuer A., Moreau J.. Mechali M.. Sobel A. J. Biol. Chem. 
268:16420-16429(1993). 

[1486] 61 2. SUA5^ci0^rdC family signature. The foltowing uncharacterized proteins have been shown [1 ] to share 
50 regions of similarities: - Yeast protein SUA5, - Escherk:hia coli hypothetkal protein yciO and H1 1 1 98, the corresponding 
Haenrxjphilus influenzae protein. - Escherichia coli hypothetk^al protein yrdC and HI0656, the corresponding Haemo- 
philus influenzae protein. - Bacillus subtilis hypothetcal protein ywtC. - Mycobacterium leprae hypothetbal protein in 
rfe-hemK intergenc regkxi. - Methanococcus jannaschii hypothetk:al protein MJ0062.These are proteins of from 20 
to 46 Kd which contain a number of consented regkxis in their N-terminal sectfon. They can be pcked up in the database 
55 by the foibwing pattern. 

(1 487] Consensus pattem: [U VMTA](3)-[LI VMFYC]-[PG]-T-(DE]-{STAJ-x-[FYJ-[GA]-[LI VM]-[GS]- 
[1488] [ 1] Bairoch A., Rudd K.E., Robison K. Unpublished obsen^atkxis (1 995), 
[1489] 61 3. Sucrose synthase 
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Sucrose synthases catalyse the synthesis of sucrose from UDP-glucose and fructose. This family includes the bulk of 
the sucrose synthase protein. However the carboxyl terminal region of the sucrose synthases belongs to the glycosyl 
transferase family Glycos transf 1 . 
[1490] 614. Sulfotransf erase proteins 
s Number of members: 59 

[1 491] 61 5. Synaptophysin / synaptoporin signature 

Synaptophysin and synaptoporin [1 ] are structurally related proteins, found in the membrane of synaptic vesicles, which 
may function as ionic or solute channels. These two glycoproteins seem to span the membrane four times. Both their 
N- and C-termini sequences seem to be cytoplasmically located. As a signature pattern for this family of proteins, a 

10 highly consented region located in the beginning of the first intravesicular loop just after the first transmembrane domain 
has been selected. This region contains a cysteine residue that may be involved in a disulfide bond. 
Consensus pattem: L-S-V4DE]-C-x-N-K-T [C may be involved in a disulfide bond [ 1] Knaus R, Marqueze-Pouey B.. 
Scherer H., Betz H. Neuron 5:453-462(1990). 
[1 492] 6 1 6. Syndecans signature 

IS Syndecans [1 ,2] (from the greek syndein; to bind together) are a family of transmembrane heparan sulfate proteogly- 
cans which are implk:ated in the binding of extracellular matrix components and growth factors. Syndecans bind a 
variety off molecules via their heparan sulfate chains and can act as receptors or as co-receptors. Structurally, these 
proteins consist of four separate domains: a) A signal sequence; b) An extracellular domain (ectodomain) of variable 
length and whose sequence is not evolutionary conserved in the varfous forms of syndecans. The ectodomain contains 

20 the sites of attachment of the heparan sulfate glycosaminoglycan side chains; c) A transmembrane region; d) A highly 
consen/ed cytoplasms domain of about 30 to 35 reskJues whk:h could interact with cytoskeletal proteins. The proteins 
known to belong to this family are: - Syndecan 1. - Syndecan 2 or fibroglycan. - Syndecan 3 or neuroglycan or N- 
syndecan. - Syndecan 4 or amphiglycan or ryudocan. - Drosophila syndecan. - Caenorhabditis elegans probable syn- 
decan (F57C7.3).The signature pattem that has been devetoped for syndecans starts with the last residue of the 

2S transmembrane region and includes the first 10 residues of the cytoplasmic domain. This regbn, whk:h contains four 
basic residues, could act as a stop transfer site. 
Consensus pattem: [FY]-R-[IMHKn]-K(2)-D-E-G-S-Y 

1 1] BemfiekJ M.. Kokenyesi R., Kato M.. Hinkes M.T. Spring J., Galk> R.L.. Lose E.J. Annu. Rev Cell Biol. 8:365-393 
(1992).I2] DavkJG. FASEB J. 7:1023-1030(1993). 

30 [1493] 617. Syntaxin / epimorphin family signature 

The foltowing proteins have been shown to be evolutbnary related [1 .2,3]: - Epimorphin (or syntaxin 2), a mammalian 
mesenchymal protein whkih plays an essential role In epithelial morphogenesis. - Syntaxin 1 A (also known as antigen 
HPC-1) and syntaxin IB which are synaptb proteins which may be involved in docking of synaptic vesicles at presy- 
naptic active zones. - Syntaxin 3. - Syntaxin 4, which is potentially involved in docking of synaptk: vesicles at presynaptic 

35 active zones. - Syntaxin 5, whkih mediates endoplasmic retrculum to golgi transport. - Syntaxin 6, which is involved in 
intracellular vesicle trafficking. - Syntaxin 7. - Yeast PEP12 (or VPS6) which is required for the transport of proteases 
to the vacuole. - Yeast SED5 which Is required for the f uskxi off transport vesnles with the Giolgi complex. - Yeast SS01 
and SS02 whk:h are required for vesicle fusion with the plasma iriembrane. - Yeast VAM3. which is required for vacuolar 
assembly. - Arabkiopsis thafiana protein KNOLLE which may be involved in cytokinesis. - Caenorhabditis elegans 

40 hypothetk:al proteins F35C8.4, F48F7.2. F55A11 .2 and T01 B1 1 .3.The above proteins share the following character- 
istics: a size ranging f rom30 Kd to 40 Kd; a C-terminal extremity which is highly hydrophobic and isprot>ably involved 
in anchoring the protein to the mennbrane; a central, well consented regkxi. whk:h seems to be in a coiled^il confor- 
matkxi. The pattem specific for this family is based on the most consented regkxi of the coiled coil domain. 
Consensus pattern: [RQ]-x(3)-[LIVMAl-x(2)-tLIVM]-IESH]-x(2)-[UVMTl-x-[DEVM]^LIVM]-x(2)-[LIVM 

45 [LIVr^]-x(3)-[UVT]-x(2)-Q-[GADEQ]-x(2)-[LIVM]-IDNQThx-[LIVMn-[DESV]-x(2)-[Ll^ 

[ 1] Bennett M.K., Garcia-Arraras J.E., Elferink LA., Peterson K., Fleming A.M., Hazuka CD., Scheller R.H. Cell 74: 
863-873(1993)1 2] Spring J., Kato M.. BemfieM M. Trends Biochem. Sci. 18: 124-1 25(1 993). [ 3J Pelham H.R.B. Cell 
73:425-426(1993). 
[1494] 618. Sm protein 

so The U1 , U2, U4/U6, and U5 small nuclear ribonucleoprotein particles (snRNPs) Involved in pre-mRN A splicing contain 
seven Sm proteins (B/ff, D1, D2, D3, E, F and G) in common, whk:h assemble around the Sm site present in four of 
the major spfceosomal small nuclear RNAs. These proteins contain a common sequence motif in two segments, Smi 
and Sm2. separated by a short variable linker. 

[1495] [1] Hennann H, Fabrizio P. Raker VA. Foulaki K, Homig H, Brahms H, Luhrmann R EMBO J 1995;14: 
ss 2076-2088. [2] Kambach C. Waike S. Young R, Avis JM, de la Fortelle E. Raker VA. Luhrmann R. LI J. Nagai K; Cell 
1999;96:375-387. 
[1496] 619. Skpl family 

[1497] [1] Stebbins CE, Kaelin WG Jr. Pavletch NP; Science 1 999;284:455-461 . 
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[1498] 620. Protein secY signatures 

The eubacterial secY protein {I] plays an important role in protein export. It interacts with the signal sequences of 
secretory proteins as well as with two other components of the protein translocation system: secA and secE. SecY is 
an integral plasma membrane protein of 419 to 492 amino acid residues that apparently contains ten transmembrane 
segments. Such a structure probabfyconfers to secY a Iranslocator* function, providing a channel for periplasmic and 
outer-membrane precursor proteins.Homologs of secY are found in archaebacteria [2]. SecY is also encoded in the 
chloroplast genome of some algae [3] where it could be involved in a prokaryotic-like protein export system across the 
two membranes cl the chloroplast endoplasmic reticulum (CER) which is present in chromophyte andcryptophyte algae. 
Two signature patterns have been developed for secY proteins. The first corresponds to the second transmembrane 
region, which is the most consen/ed section of these proteins. The second spans the C-termina! part of the fourth 
transmembrane region, a short intracellular loop, and the N-terminal part of the fifth transmembrane region. 
Consensus pattern: [GSTJ^LIVMF](2)-x^LIVMJ-G^LIV^fl-x-P-[LlVMFY](2)-x^ASHGSTQJ-[LI^ 
(2) 

Consensus pattem: [LIVMFYWl(2)-x-[DE]-x4LIVMF]4STNhx(2)-G-[LIVMF]4GSTl-[NS71-G-x-[GS714^ 

[ 1] ItoK. Mol. Microbiol 6 2423-2428(1 992).[ 2] Auer J.. SpickerG., Boeck A. Biochimie 73:683-688(1 991 ).[ 3] Douglas 

S.E. FEBS Lett. 298:93-96(1992). 

[1499] 621 . (Seed protein) Small hydrophilfc plant seed proteins signature. The following small hydrophilic plant seed 
proteins are stmcturally related: - Arabidopsis thaliana proteins GEA1 and GEA6. - Cotton late embryogenesis abun- 
dant (I^A) protein D-1 9. - Carrot EMB-1 protein. - Bartey LEA proteins B1 9. 1 A. B1 9. 1 B, B1 9. 3 and B1 9.4. - Maize late 
embryogenesis abundant protein Emb564. - Radish late seed maturatron protein p8B6.-Rice embryonic abundant 
protein Empl. - Sunflower 10 Kd late embryogenesis abundant protein (DS10). - Wheat Em proteins. These proteins 
contains from 83 to 153 amino acid reskiues and may play a roleI1,2] In equipping the seed for sun^ival, maintaining 
a minimal level of hydratk)n in the dry organism as\6 preventing the denaturatbn of cytoplasmic components. They 
may also play a role during imbibilk)n by controlling water uptake. As a signature pattem, the best consented region 
in the sequence of these proteins has been developed, it is a glycine-rk:h nonapeptide kx^ted in the N-terminal section.- 
[1500] Consensus pattem: G-[EQ]-T-V-V-P-G-G-T- 

[1501] 1 1] Dure L III. Crouch M.. Harada J.. Ho T-H. D., Mundy J,. Quatrano R.. Thonrras T. Sung ZR. Plant Mol. 
Bk)l. 12:475-486(1 989).[ 2] Gaubier R, Raynal M.. Hull G.. Hueslis G.M.. Grellet R. Arenas C, Pages M.. Delseny M. 
Mol. Gen. Genet 238:409-418(1993). 
[1502] 622. Serine carboxypeptidases. active sites 

All known cartx>xypeptidases are either metalb carboxypeptidases or serinecarboxypeptkiases. The catalytn activity 
of the serine carboxypeptklases. like that of the trypsin family serine proteases, is provided by a charge relay system 
involving an aspartb ackJ resklue hydrogen-bonded to a histidine, which is itseif hydrogen-bonded to a serine [1]. 
Proteins known to be serine carboxypeptidases are: - Barley and wheat serine carboxypeptidases I, II, and III [2]. - 
Yeast carboxypeptklase Y (YSCY) (gene PRC1), a vacuolar protease involved in degrading small peptkJes. - Yeast 
KEX1 protease, involved in killer toxin artd alpha-factor precursor processing. - Fission yeast sxa2, a probable carbox- 
ypeptkiase involved in degrading or processing mating pheromones [3]. - Penicillium janthinellum carboxypeptidase 
SI [4J. - Aspergullus niger carboxypeptklase pepF - Aspergullus satoi carboxypeptklase cpdS. - Vertebrate protective 
protein / cathepstn A [5], a lysosomal protein which is not only a carboxypeptidase but also essential for the activity of 
both beta-galactoskJase and neuraminidase. - Mosquito vitellogenk; cartx)xypeptkJase (VCP) [6]. - Naegleria fowleri 
vimlence-related protein Nf314 [7]. - Yeast hypothetrcal protein YBR139w. - Caenorhabdilis elegans hypothetk^l pro- - 
teins C08H9.1, F13D12.6. F32A5.3. F41C3.5 and K10B2.2.This family also includes: - Sorghum (s)-hydroxymandek)- 
nitrile lyase (hydroxynilrile lyase) (HNL) [8], an enzyme involved in plant cyanogenesis. The sequences surrounding 
the active site serine and histidine resWues are highly consented in all these serine carboxypeptklases. 
Consensus pattem: [LIVM]-x-{GTA]-E-S-Y-{AGHGS] [S is the active site residue] 

Consensus pattem: [LIVF]-x(2)-ILIVSTA]-x-[IVPSTl-x-[GSDNQL]-[SAGVJ-[SG]-H-x-[IVAQ]-P-x(3)-[PSA] [H is the ac- 
tive site resklue] 

[ 1] Liao D.L. Remington S.J. J. Biol. Chem. 265:6528-6531 (1990). [2] Sorensen S.B.. Svendsen I., Breddam K. 
Carlsberg Res. Commun. 54: 1 93-202(1 989).( 3] Imai Y. Yamamoto M. Mol. Cell. Biol. 12:1827-1834(1 992).[ 4] Sv- 
endsen l, Hofmann T, Endrizzi J., Remington J., Breddam K. FEBS Lett. 333:39-43(1 993).[ 5] Galjart N.J., Morreau 
H., Willemsen R., Gillemans N., Bonten E.J., d*Azzo A. J. Bbl. Chem. 266:14754-14762(1 991). [6] Cho W.L., Deitsch 
K.W., Raikhel A.S. Proc. Natl. Acad. Sci. U.S.A. 88:10821 -10824(1 991 ).[ 7) Hu W.N.. Kopachik W.. Band R.N. Infect. 
Immun. 60:2418-2424(1 992).[ 8] Vtejant H.. Mundry K.W.. Pfrtzenmaier K. Plant Mol. Bk>l. 26:735-746(1 994). [ 9] Ftewl- 
ings N.D.. Barrett A.J. Meth. Enzymol. 244:19-61 (1994). [El] 

[1503] 623 Serpins signature. Serpins (SERine Proteinase INhibitors) [1.2.3,4] are a group of structurally related 
proteins. They are high molecular weight (400 to 500 amino acids).extracellular, irreversible serine protease inhibitors 
with a well defined structuraMunctwnal characteristc: a reactive region that acts as a •bait' for an appropriate serine 
protease. This reg»n is found in the C-terminal part of these proteins. Proteins which are known to belong to the serpin 
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family are listed below (references are only provided for recently determined sequences): - Alpha-1 protease inhibitor 
(alpha-1 -antitrypsin, contrapsin). - Alpha-1 -antichymotiypsin, - Antithrombin III. - Alpha-2-antiplasmin. - Heparin co- 
factor II. - Complement CI inhibitor. - Plasminogen activator inhibitors 1 (PAM) and 2 (PAI-2). - Glia derived nexin 
(GDN) (Protease nexin I). - Protein C inhibitor. - Rat hepatocytes SPI-1 , SPI-2 and SPI-3 inhibitors. - Human squamous 

5 ceil carcinoma antigen (SCCA) which may act in the nrwdulation of the host immune response against tumor cells. - A 
lepidopteran protease inhibitor. - Leukocyte elastase inhibitor which, in contrast to other serpins, is an intracellular 
protein. - Neuroserpin [5], a neuronal inhbitor of plasminogen activators and plasmin. - Cowpox virus cnmA (6], an 
inhibitor of the thiol protease interleukin-1 B converting enzyme (ICE). CnmA is the only serpin known to inhibit a non- 
serine proteinase. - Some orthopoxviruses probable protease inhibitors, whk:h may be involved in the regulatbn of the 

10 blood clotting cascade arul/or of the complement cascade in the mammalian host. On the basis of strong sequence 
sinrularities. a number of proteins with no known inhibitory activity are saki to bekmg to this family: - Birds ovalbumin 
and the related genes X and Y proteins. - Angiotenstnogen; the precursor of the angbtensin active peptide. - Barley 
protein Z; the major endosperm albumin. - CortkxKterokJ binding globulin (CBG). - Thyroxine-binding gtobulin (TBG). 
- Sheep uterine milk protein (UTMP) and pig uteroferrin-associated protein (UFAP). - Hsp47, an endoplasmic reticulum 

IS heat-shock protein that binds strongly to collagen and couU act as a chaperone in the collagen biosynthetic pathway 
[7]. - Maspin, whk:h seems to function as a tumor supressor [5]. - Pigment epithelium-derived factor precursor (PEDF). 
a protein with a strong neutrophic activity [8]. - Ep45. an estrogen-regulated protein from Xenopus [9]. A signature 
pattern has been developed tor this family of proteins, centered on a well consented Pro-Phe sequence which is found 
ten to fifteen reskiues on the C-termina! side of the reactive bond 

20 [1504] Consensus pattem: [UVA4FY]-x4UVMFYAC]4DNQ]-[RKHQSHPST]-F-{UVMFY]-[LIVMFYC]-x-[LIVMFAH]- 
[ IJ Can^ell R., Travis J. Trends Biochem. Sd. 10:20-24(1 985). [ 2] Carrell R., Pemberton R A., Boswell D.R. Cold Spring 
Harbor Symp. Quant. Bk>l. 52:527-535(1 987). [ 3] Huber R.. Carrell R.W. Bkjchemistry 28:8951 -8966(1 989). [4) 
RemoldO'Donneel E. FEBS Lett. 315:105-108(1993).t 5]OstenrakJerT, Contartese J., Stoeckli E.T, Kuhn TB., Son- 
deregger R EMBO J. 15:2944-2953(1 996). [6] Komiyama T, Ray C.A., Pickup D.J., Howard A.D.. Thomberry N.A.. 

2S Peterson E.R, Salvesen G. J. Biol. Chem. 269: 19331 -19337(1 994). [ 7] Clarke E., Sandwal B.D. Biochim. Biophys. 
Ada 1129:246-248(1992).[ 8] Zou Z.. Anisowk:z A., fMeveu M.. Rafidi K., Sheng S., Sager R.. Hendrix M.J.. Settor E.. 
Thor A. Science 263:526-529(1 994).[ 9] Steele F.R., Chader G.J., Johnson LV., Tombran-Tink J. Proc. Natl. Acad. 
Sd. U.S.A. 90:1526-1530(1993).[10] Holland LJ., Suksang C. W^ll A.A., Roberts LR.. Moser D.R.. Bhattacharya A. 
J. Bfol. Chem. 267:7053-7059(1992). 

30 [1505] 624. Sigma-54 interaction domain signatures and profile 

Some bacterial regulatory proteins activate the expres8k>n of genes from promoters recognised by core RN A polymer- 
ase associated with the alternative sigma-54 ^or. These have a conserved domain of about 230 reskJues involved 
in the ATP-dependent [1 ,2] interactk>n with sigma-54. This domain has been found in the proteins listed below: - acoR 
from Atealigenes eutrophus, an activator of the acetoin catabollsm operon acoXABC.-algB from Pseudomonas aeru- 

35 ginosa, an actuator of alginate biosynthetic gene algD. - dctD from Rhizobium an activator of dct A, the C4-dbarboxy late 
transport protein. - dhaR from Citrobacter freundii, a regulator of the dha operon for glycerol utilizatkyi. - fhIA from 
Escherichia coli. an activator of the formate dehydrogenase H and hydrogenase 111 structural genes. - flbD from Cau- 
tobacter crescentus, an activator of flagellar genes. - hoxAfrom Alcaligenes eutrophus. an activator of the hydrogenase 
operon. - hrpS from Pseudonx)nas syringae, an activator of hprO as well as other hrp loci involved in plant pathogenicity 

40 - hupRI from Rhodobacter capsulatus, an activator of the [NiFe] hydrogenase genes hupSL. - hydG from Escherichia 
coli and Salmonella typhimurium. an activator of the hydrogenase activity - levR from Bacillus subtilis, whk:h regulates 
the expressbn of the levanase operon (levDEFG and sacC). - nifA (as well as anf A and vnf A) from various bacteria, 
an activator of the nif nitrogen-fixing operon. - ntrC, from varbus bacteria, an activator of nitrogen assimilatory genes 
such as that for glutamine synthetase (gInA) or of the nif operon. - pgtA from Salmonella typhimurium, the activator of 

4S the inducible phospho- glycerate transport system. - pilR from Pseudomonas aeruginosa, an activator of pilin gene 
transcriptk>n. - rocR from Bacillus subtilis, an activator of genes for arginine utilization - tyrR from Escherichia coli, 
involved in the transcriptbnal regulatbn of aromatk: amino-acid bbsynthesis and transport. - wtsA, from Erwinia stew- 
artn, an activator of plant pathogenicity gene wtsB. - xylR from Pseudomonas putkia, the activator of the tol plasmd 
xylene catabolism operon xylCAB and of xylS. - Escherichia coll hypothetical protein yfhA. - Escherichia coli hypothet- 

so ical protein yhgB. About half of these proteins (algB, dcdT, flbD, hoxA, hupRI, hydG, ntrC, pgtA and pilR) belong to 
signal transductran two-component systems [3] and possess a domain that can be phosphorylated by a sensor-kinase 
protein in their N- temriinal section. Almost all of these proteins possess a helix-turn-helix DNA-binding domain in their 
C-terminal sectkxi. The domain whk:h interacts with the sigma^ factor has an ATPase activity. This may be required 
to promote a conformatkxial change necessary for theinteractkxi [4]. The domain contains an atypical ATP-binding 

ss motif A (P-kx)p) as well as a fomi of motif B. The two ATP-binding motifs are kx^ted in the N-terminal section of the 
domain; signature patterns have been devek)ped for both motifs. Other regions of the domain are also consented. One 
of them, located in the C-terminal sectbn. has been selected as a third signature pattem. 
Consensus pattem: [LIVI^FY](3)-x-G-(DEQHSTE]-G-ISTAVl-G-K-x(2)-[U VMFY] 
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Consensus pattern: [GS]-x-[LIVMFl-x(2)-A-[DNEQASHHGNEKJ-G-[ST1MHUVM^^ 
Consensus pattern: [FYW|-P-[GShN4UVMhFHEQ)-L-x4NHAT| 

( 1] MoH'ett E.. Segovia L J. Bacteriol. 175:6067-6074(1 993). [ 2] Austin S., Kundrot C, Dixon R. Nucleic Acids Res. 
1 9:2281 -2287(1 991 ).( 3] Albright LM.. Huala E.. Ausubel F.M. Annu. Rev. Genet. 23:31 1-336(1 989). [4] Austin S.. 
s Dixon R. EMBO J. 11:2219-2228(1992). 

[1506] 625. Sigma-70 factors famity signatures 

Sigma factors [1] are bacterial transcription initiation factors that promote the attachment of the core RNA polymerase 
to specific initiation sites and arethen released. They alter the specificity of pronrwter recognition. Most bacteria express 
a multiplicity of sigma factors. Two of these factors, sigma-70 (gene rpoD). generally known as the major or primary 

10 Sigma factor, and sigma-54 (gene rpoN or ntrA) direct the transcription of a wide variety of genes. The other sigma 
factors, known as attemative sigma factors, are required for the transcriptk)n of specific subsets of genes. With regard 
to sequence similarity, sigma factors can be grouped into two classes: the sigma^ and sigma-70 families. The signna- 
70 family includes, in addition to the primary sigma factor, a wkJe variety of sigma factors, some of which are listed 
below: - Bacillus sigma factors involved in the control of spoojlation-specific genes: sigma-E (sigE or spollGB), signia- 

15 F (sigF or spollAC), sigma-G (sigG or spolllG), sigma-H (sigH or spoOC) and sigma-K (sigK or spolVCB/spolllC). - 
Escherichia coli and related bacteria sigma-32 (gene rpoH or htpR) involved in the expression of heat shock genes. - 
Escherichia coli and related bacteria sigma-27 (gene fliA) involved in the expression of the flagellin gene. - Escherichia 
coli sigma-S (gene rpoS or katF) which seems to be involved in the expresskxi of genes required tor protectkxi against 
external stresses. - Myxococcus xanthus sigma-B (sigB) which is essential for the late-stage differentiation of that 

20 bacteria. Alignments of the sigma-70 family permit the kJentification of four regions of high consewation [2,3]. Each of 
these four regions can in turn be subdivkied into a number of sub-regk)ns. Signature patterns based on the two best- 
consen^ed sub-regk)ns have been devek)ped. The first pattern corresponds to sub-regton 2.2;the exact f unctkxi of this 
sub-region is not known although it couW be involved in the binding of the signra factor to the core RNA polymerase. 
The second pattern con'esponds to sub-region 4.2 which seems to harlx)r a DNA-bindIng 'helix-turn-helix* motif involved 

25 in binding the conserved -35regk)n of promoters recognized by the major sigma factors. The second pattern starts one 
reskJue before the N-tenminal extremity of the HTH regton and ends six residues after its C-terminal extremity. 
Consensus pattern: PEHUVMF](2)4HEC)S)-x-G-x-[UVMFA}-G-L-[UVMFYE]-x4GSAMHLIVMAP] 
Consensus pattern: [STN]-x(2)4DEQHUVMHGAS]-x(4)4LIVMFh[PSTG]-x(3)-[LTVMAhx-INQRHLIVI^HEQHhx 
(3)-[LIVMFWl-x(2HUVM] 

30 1 1] Helmann J.D., Chamberlin M.J. Annu. Rev. Biochem. 57:839-872(1 988). [ 2] Gribskov M., Burgess R.R. Nuclerc 
Acids Res. 1 4:6745-6763(1 986).[ 3] Lonetto M.A., Gribskov M., Gross C.A. J. Bacteriol. 174:3843-3849(1 992). [4] 
Lonetto M.A.. Brown K.L, Rudd K.E.. Buttner M.J. Proc. Natl. Acad. Sci. U.S.A. 91:7573-7577(1994). 
[1507] 626. Signal carboxyt-termtnal domain. 430 members. 
[1508] 627. Signal peptidases I signatures 

35 Signal peptidases (SPases) [1 ] (also krtown as leader peptidases) remove the signal peptkJes from secretory proteins. 
In prokaryotes three types of Spases are known: type I (gene lepB) whteh is responsible for the processing of the 
nnajority of exported pre-proteins; type II (gene Isp) whk*i only process lipoproteins, and a third type involved in the 
processing of pili subunits. SPase I is an integral membrane protein that Is anchored in the cytoplasmic membrane by 
one (in B. subtilis) or two (in E. coli) N-termnnal transmembrane domains with the main part of the protein protuding In 

40 the periplasms space. Two residues have been shown [2,3] to be essential for the catalytc activity of SPase I: a serine 
and an lysine.SPase I is evolutionary related to the yeast mitochondrial inner membrane protease subunit 1 and 2 
(genes IMP1 and IMP2) whch catalyze the removal of signal peptides required for the targeting of proteins from the 
mitochondrial matrix, across the inner membrane, into the tnter-membrane space [4]. In eukaryotes the removal of 
signal peptides is effected by an ollgomeric enzymatk: complex composed of at least five subunits: the signal peptidase 

45 complex (SPC). The SPC is kxated in the endoplasmc reticulum membrane. Two components of mammalian SPC, 
the 18 Kd (SPC18) and the 21 Kd (SPC21) subunits as well as the yeast SEC11 subunit have been shown [5] to share 
regions of sequence similarity with prokaryolk: SPases I and yeast IMP1/IMP2. Three signature patterns for these 
proteins have been devek)ped. The first signature contains the putative active site serine, the second signature contains 
the putative active site lysine whk:h is not conserved in the SPC subunits, and the third signature corresponds to a 

so conserved regbn of unknown iotogcal significance which is kx:ated in the C-terminal sectk)n of all these proteins. 
Consensus pattern: [GS]-x-S-M-x-{PS]-[AT|-[LF] [S Is an active site residue] 

Consensus pattern: K-R-IUVMSTA](2)-G-x-[PG]-G-[DE]-x-[LIVM]-x-[LIVMFY] [K is an active site residue] 
Consensus pattern: [LIVMFYW](2)-x(2)-G-D-[NH}'X(3)-[SND]-x(2)-[SG] 

[ 1) Dalbey R.E., von Heijne G. Trends Bkxhem. Sci. 1 7:474-478(1 992).[ 2] Sung M.. Dalbey R.E. J. Biol. Chem. 267: 
ss 131 54-1 31 59(1 992).[ 3] Black M.T J. Bacterbl, 175:4957-4961 (1993).[ 4] Nunnari J., FoxTD., Walter P Science 262: 
1 997-2(X)4(1 993).[ 5] van DijI J.M., de Jong A, Vtehmaanpera J.. Venema G., Bron S. EMBO J. 11 :281 9-2828(1 992). 
[ 6] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244: 19-61 (1994). [El] 
[1509] 628. (sodcu) Copper/Zirtc superoxkie dismutase signatures 
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Copper/Zinc superoxide dismutase (SOOC) [1] is one of the three forms of an enzyme that catalyzes the dismutation 
of superoxide radicals. SODC binds one atom each of zinc and copper. Various forms of SODC are known: acytoplasmic 
form in eukaryotes, an additional chtoroplast form In plants, an extracellular form in some eukaryotes, and a periplasmic 
form in prokaryotes. The metal binding sites are consen/ed In all the known SODC sequences [2]. Two signature 
s patterns have been derived for this family of enzymes: the first one contains two histidine reskiues that bind the copper 
atom; the second one iskx:ated in the C-terminal section of SODC and contains a cysteine which is involved in a 
disulfide bond. Consensus pattern: (GAHIMFAT>H4LIVF]-H-x(2HGP]-[SDG}-x-[STAGDE] [The two H's are copper 
ligands] 

Consensus pattern: G-[GN]-[SGA]-G-x-R-x-[SGAJ-C-x(2)-[IVI 1^ is involved In a disulfide bond] 
10 1 1] Bannister J.V., Bannister W.H.. Rotilb G. CRC Crit. Rev. Bkxhem. 22:1 11-154(1 987).[2] Smith M.W.. Doolittle R. 
F. J. htol. Evol. 34:175-184(1992). 

[1510] 629. (sodfe) Manganese and iron superoxkte dismutases signature 

Manganese superoxkJe dismutase (SODM) [1] is one of the three iorms of an enzyme that catalyzes the dismutatkxi 
of superoxide radicals. The four ligands of the manganese atom are consen/ed in all the known SODM sequences. 
IS These metal ligands are also conserved in the related iron form of superoxide dismutases [2,3]. A short conserved 
region whk:h includes two of the four ligands: an aspartate and a histidine has been selected as a signature. 
Consensus pattern: D-x-W-E-H-[STAHFYJ(2) [D and H are manganese/iron ligands] 

[ 1] Bannister J. v., Bannister W.H., Rotifo G. CRC Crit. Rev Biochem. 22: 111-1 54(1 987).[ 2] Parker M.W.. Blake C.C. 
F. FEBS Lett. 229:377-382(1 988). [3] Smith M.W.. Doolittle RF J. Mol. Evol. 34:175-184(1992). 
20 [1511] 630. Spectrin repeat 

[1 512] Spectrin repeats are found in several proteins involved in cytoskeletal structure. These include spectrin, alpha- 
actinin and dystrophin.The sequence repeat used in this family is taken from the structural repeat in reference [2]. The 
spectrin repeat forms a three helix bundle. The second helix is Interrupted by proline in some sequerK:es. 
Number of members: 898 

2S [1] Actin-binding proteins. 1 : Spectrin super family Hartwig JH; Protein Profile 1 995:2:732-732. [2] Crystal struc- 

ture of the repetitive segments of spectrin. Yan Y, Winograd E, Viel A, Cronin T, Harrison SC. Branton D; Science 1 993' 
262:2027-2030. 

[1513] 631 . (subtilase) Streptomyces subtilisin-type Inhibitors signature 

Bacteria of the Streptomyces family produce a family of proteinase lnhibitors(1] characterized by their strong activity 
30 toward subtlllsln. They arecollectively known as SSI's: Streptomyces SubtHisIn Inhibitors. Some SSI'salso inhibit trypsin 
or chymotrypsin. In their mature secreted form. SSI's areproteins of about 110 residues with two conserved disulfide 
bonds. H y h + III! 

xxxxxxxxxxxxxxCxxxxxxxCxxxxxxxxxCx#xxxxxxxxxxxxCxxxxxx****"*"**"C': conserved cysteine involved in a di- 
sulfide bond.T: active site resklue.'^': posltbn of the pattern. 
35 Consensus pattern: C-x-P-x(2,3)-G-x-H-P-x(4)-A-C-[ATD]-x-L [The two C's are involved in a disulfide bond] 
[ 1] Taguchi S., Kojima S., Terabe M., Miura K.-I., Momose H. Eur. J. Biochem. 220:911-918(1994). 
[1514] 632. Sugar trarisport proteiris signatures 

In mammalian cells the uptake of glucose is mediated by a family of closely related transport proteins which are called 
the glucose transporters [1,2,3].At least seven of these transporters are currently known to exist (In Human they are 

40 encoded by the GLUTI to GLUT7 genes).These Integral membrane proteins are predicted to comprise twelve mem- 
brane spanning domains. The glucose transporters show sequence similarities [4,5] with a number of other sugar or 
metabolite transport proteins listed betow (references are only provided for recently determined sequences). - Es- 
cherichia coli arabinosei)roton symport (araE). - Escherichia coli galactose-proton symport (galP). - Escherchia coli 
and Klebsiella pneumoniae citrate-proton symport (also known as citrate utilization determinant) (gene crt). - Es- 

45 cherlchia coli alpha-ketoglutarate permease (gene kgtP). - Escherichia coli proline/betahe transporter (gene proP) [6]. 
- Escherichia coli xytose-proton symport (xylE). - Zymomonas mobilis glucose facilitated diffusbn protein (gene gif). - 
Yeast high and low affinity glucose transport proteins (genes SNF3. HXT1 to HXT14). - Yeast galactose transporter 
(gene GAL2). - Yeast maltose permeases (genes MAL3T and MALBT). - Yeast myo-inositol transporters (genes ITR1 
and ITR2). - Yeast carboxylic ackl transporter protein homotog JEN1. - Yeast inorganic phosphate transporter (gene 

SO PH084). - Kluyveromyces lactis lactose permease (gene LAC12). - Neurospora crassa quinate transporter (gene Qa- 
y). and Emericella nidulans quinate permease (gene qutD). - Chtorella hexose carrier (gene HUP1 ). - Arabktopsis 
thaliana glucose transporter (gene STP1). - Spinach sucrose transporter. - Leishmania donovani transporters D1 and 
D2. - Leishmania enriettii probable transport protein (LTP). - Yeast hypothetrcal proteins YBR241c. YCR98c and 
YFL040W. - Caenortiabditis elegans hypothetical protein ZK637.1. - Escherichia coli hypothetical proteins yabE. ydjE 

ss and yhjE. - Haemophilus Influenzae hypothetkal proteins HI0281 and HI04ia - Bacillus subtills hypothetical proteins 
yxbC and yxdF. It has been suggested [4] that these transport proteins have evolved from theduplicatbn of an ancestral 
protein with six transmembrane regwns, this hypothesis is based on the conservatbn of two G-R-[KR] motifs. The first 
one is kx^ated between the second and third transmembrane domains and the second one between transmembrane 
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domains 8 and 9. Two patterns have been developed to detect this family of proteins. The first pattem is based on the 
G-R4KR] motif; but because this motif is too short to be specific to this family of proteins, a pattem from a larger region 
centered on the second copy of this motif was derived. The second pattern is based on a number of conserved residues 
which are located at the end of the fourth transmembrane segment and in the short loop region between the fourth 
s and fifth segments. 

Consensus pattem: [LIVMSTAG]-ILIVMFSAG]-x(2HLIVMSA]-[DE)-x-[LI VMFYWA]-G- R-IRK]-x(4.6)-{GSTA] 
Consensus pattem: [LIVMf^-x-G^UVMFA]-x(2)-G-x(B)-{LIFY]-x(2HEQ^x(6)- [RK] 

( 1] Silverman M. Annu. Rev Biochem. 60:757-794(1 991 ).[ 2] Gould G.W.. Bell G.I. Trends Biochem. Sci. 15:18-23 
{1990).[ 3] Baldwin S.A. Biochim. Biophys. Acta 11 54: 17-49(1 993). [ 4] Maiden M.C.J.. Davis E.G., Baldwin S.A.. Moore 
10 D.C.M., Henderson P.J.F. Nature 325:641 -643(1 987). [ 5] Henderson P.J.R Curr. Opin. Struct. Biol. 1:590-601(1991). 
[6] Culham D.E.. Lasby B., Marangoni A.G.. Milner J.L, Steer BA, van Nues RW., Wood J.M. J. Mol. Biol. 229: 
268-276(1993). 

[1515] 633. Synaptobrevin signature 

Synaptobrevin [1] is an intrinsic membrane protein of small synaptic vesicles whose function is not yet known, but 
IS which is highly consen/ed in mammals, electric ray (where its is known as VAMP-1 ). Orosophila and yeast [2]. In yeast 
there are two ctosely related forms of synaptobrevin (genes SNC1 andSNC2) while In mammals there Is at least 4 
(genes SYB1 , SYB2. SYB3 and SYBL1 ).Structurally synaptobrevin consist of a N-terminal cytoplasmfc domain of from 
90 to 110 reskJues, followed by a transmembrane region, and then by a short (from 2 to 22 rescues) C-terminal intra- 
vesicular domain. As a signature pattern for synaptobrevin, a highly conserved stretch of residues kx^ated in the central 
20 part of the sequence was selected. 

Consensus pattem: N-[U VMHDENSHKLhV-x4DEQ]-R-x(2)-[KRHLIVM]-[STDE]- x-[LIVM)-x-{DE]-[KR]-[TAHDE] 
[ 1] Suedhof T.C., Baumert M.. Perin M.S.. Jahn R Neuron 2: 1475-1 481(1 989). [ 2] Gerst J.E.. Rodgers L, Riggs M., 
Wigler M. Proc. Natl. Acad. Sd. U.S.A. 89:4338-4342(1992). 

[1516] 634. TBC domain. Identifrcatwn of a TBC domain in GYP6_YEAst and GYP7_YEASX which are GTPase 
2S activator proteins of yeast Ypt6 and Ypt7. Imply that these domains are GTPase activator proteins of Rab-like small 
GTPases. Number of members: 55 

[1] Medline: 96032578. Molecular ctoning of a cDNA with a novel domain present in the tre-2 oncogene and the 
yeast cell cycle regulators BUB2 and cdc16. Rchardson PM, Zon LI; Oncogene 1995;11:1139-1148. 
30 [2]MedIine: 97398935. A shared domain between a spindle assembly checkpoint protein and Ypt/Rab-speclfic 

GTPase-activators. Neuwaki AF; Trends Bkx^hem Sci 1997;22:243-244. 

[1517] 635. Transcriptbn factor TFIID repeat signature (TBP) 

Transcriptton factor TFIID (or TATA-binding protein, TBP) [1 ,2] is a general factor that plays a major role in the activation 
35 of eukaryotic genes transcribed by RNA polymerase II. TFIID binds specifically to the TATA box promoter element 
whk^h lies ck>se to the position of transcriptkm initiation. There is a rennarkable degree of sequence conservation of a 
C-terminal domain of about 180 residues in TFIID from varkNJS eukaryotk: sources. This region isnecessary and suf- 
ficient for TATA box binding. The most signlfk:ant structural feature of this domain is the presence of two consented 
repeats of a 77 amino-add region. The intranru>lecular symmetry generates a saddle-shaped structure that sits astride 
40 the DNA [3]. Drosophila TRF (TBP-related factor) [4] is a sequence-specify transcriptbn factor that also binds to the 
TATA box and is highly similar to TFIID. Archaebacteria also possess a TBP homotog [5). A signature pattem that 
spans the last 50 resklues of the repeated regkm has been derived.- 

Consensus pattern: Y-x-P-x(2)-[IF]-x(2)-[LIVM](2)-x-[KRH)'X(3)-P-[RKQ]-x(3)- L-[LIVM].F-x-[STN]-G-[KR]-[LIVM]-x 
(3)-G-[TAGLHKR]-x(7)- [AGCI-x(7)-fLIVM [ 1] Hoffmann A.. Sinn E.. Yamamoto T, Wang J., Roy A., Horikoshi M., 
4S Roeder R,G. Nature 346:387-390(1 990). [ 2] Gash A., Hoffmann A.. Horikoshi M., Roeder RG.. Chua N.-H. Nature 
346:390-394(1 990).[ 3] Nikotov D.B.. Hu S.-H., Lin J.. Gasch A., Hoffmann A.. Horikoshi M., Chua N.-H.. Roeder R. 
G., Burley S.K. Nature 360:40-46(1 992). [ 4] Crowley TE„ Hoey T. Liu J.-K.. Jan Y.N., Jan LY, Tjian R Nature 361: 
557-561 (1993).[ 5] Marsh TL. Re»h C.I., Whilekx:k R.B., Olsen G.J. Proc. Natl. Acad. Sci. U.S.A. 91:4180-4184 
(1994). 

so [1518] 636. Translationally controlled tumor prc^ein signatures (TCTP) 

Mammalian translatkjnally controlled tumor protein (TCTP) (or P23) is a protein which has been found to be preferen- 
tially synthesized in cells during the earty growth phase of some types of tumor (1 .2), but whteh is also expressed in 
normal cells. The physk>logk»l functbn of TCTP is still not known. It is a hydrophilic protein of 18 to 20 Kd. Close 
homotogs have been found in plants [3], earthwonm [4], Caenorhabditis elegans (F52H2,11), Hydra, budding yeast 

ss (YKL056C) [5] and fisskxi yeast (SpACI F1 2.02c) Two of the best consented regk>ns have been selected as signature 
patterns for TCTP. 

Consensus pattem: [IFA]-[GAHGAS]-N-[PAK]-S-[GA1-E-[6DE]-IPAGEHDEQGA] 

Consensus pattem: tFL\mi-[FYl-IIVCT|-G-E-x-[MA]-x(2.5)-(DEN]-{GAST|-x-[LV]-[AVl-x(3)-[FYW] 
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[ 1) Boehm H.. Beendorf R., Gaestel M., Gross B., Nuernberg R, Kraft R., Otto A., Bielka H. Biochem. Int. 19:277-286 
(1989).[ 2] Makrides S.. Chitpatima S.T, Bandyopadhyay R.. Brawerman G. Nucleic Acids Res. 16:2350-2350(1 988). 
[3] Pay A.. Heberle-Bors E.. Hirt H. Plant Mol. Biol. 19:501 -503(1 992).[ 4] Stuerzenbaum S.R.. Kille R. Morgan A.J. 
Biochim. Biophys. Acta 1398:294-304(1 998). [ 5] Rasmussen S.W. Yeast 10:S63-S68(1994). 

s [1519] 637. TFI IS zinc ntbon domain signature 

Transcription factor S^l (TFIIS) [1] is a eukaryotic protein necessary for efficient RNA polymerase II transcription elon- 
gation, past template-encoded pause sites. TFIIS shows DNA-bindIng activity only in the presence of RNA polymerase 
11. It is a protein of about 300 amino acids whose sequence is highly conserved in mammals, Drosophila, yeast (where 
it was first known as PPR2. a transcriptbnal regulator of URA4, and then as DST1 , the DNA strand transfer protein 

10 alpha [2]) and in the archaebacteria Sulfotobus ackiocaldarius [3].This family also includes the eukaryotic and arche- 
bacterial RNA polymerase subunits of the 1 5 Kd / M family (see <PDOC00790 >) as well as the following viral proteins: 
- Vaccinia vims RNA polymerase 30 Kd subunit (rpo30) [4]. - African swine fever vims protein I243L [5J.The best 
consen/ed region of all these proteins contains four cysteines that bind a zinc bn and fokJ in a conformation termed a 
'zinc ribbon* [6]. Besides these cysteines, there are a number of other conserved reskJues whk:h can be used to help 

IS define a specific pattern for this type of domain. 

Consensus pattern: C-x(2)-C-x(9)-[UVMQSARHQHHSTQLJ-[RAI4SACR]-x-[DEHDETHPGSEAhx^^ 
(3)-(FW] [The four C*s are zinc ligands] 

[1] Hirashima S., Hirai H., Nakanishi Y, Natori S. J. Biol. Chem. 263:3858-3863(1 988). [ 263:3858-3863(1 988). [ 2] Ki- 
pling D., Kearsey S.E. Nature 353:509-509(1991).[ 3] Langer D., Zillig W. Nuciek: Ackis Res. 21:2251 -2251(1 993). [ 4] 
20 Ahn B.-Y. Gershon RD., Jones E.V., Moss B. Mol. Cell. B»l. 10:5433-5441 (1990).[ 5] Rodriguez J.M., Salas M.L. 
Vinuela E. Virotogy 186:40-52(1 992).[ 6] Qian X., Jeon C. Yoon H., Aganwal K.. Weiss M. A. Nature 365:277-279(1 993). 
[15201 638. Tetrahydrololate dehydrogenase/cyctohydrolase signatures (THF DHG CYH) 

Enzymes that partbipate in the transfer of one-carbon units are involved in various bosynthetk: pathways. In many of 
these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF). N^rbus 

2S reactwns generate one-carbon derivatives of THF which can be interconverted between different oxidatwn states by 
formyltetrahydrofolate svnthetase(EC 6.3.4.31 . methylenetetrahydrotolate dehydrogenase (EC 1.5.1.5 or EC 1.5.1.151 
and methenyttetrahydrofolate cyctohydrolase (EC 3.5.4.9 ).The dehydrogenase and cyctohydrolase activities are ex- 
pressed by a variety of multifunctk)na! enzynries: - Eukaryotc C-1 -tetrahydrofolate synthase (CI -THF synthase), which 
catalyzes all three reactbns described above. Two fomns of CI -THF synthases are known [1], one is kx»ted in the 

30 mitochondrial matrix, while the second one is cytoplasms. In both forms the dehydrogenase/cyctohydrolase domain 
is located in the N-tenminal section of the 900 amino acids protein and consists of about 300 amino ackJ reskiues. The 
CI -THF ^thases are NADP- dependent - Eukaryotfc mitochondrial blfunctkxial dehydrogenasefcyctohydrolase [2]. 
This is an homodimeric NAD-dependent enzyme off about 300 amino acid residues. - Bacterial folD [3]. FolD is an 
homodimerk: bifunctfonal N ADP-dependent enzyme of about 290 amino ackJ reskiues. The sequence of the dehydro- 

3S genase/cyclohydrolase domain is highly consen/ed in all forms of the enzyme. Two consented regions have been 
selected as signature patterns. The first one is kx:ated in the N-terminal part of these enzymes and contains three 
acidk: reskiues. The second pattern is a highly consen/ed sequence of 9 amino ackte whk:h is kx:ated in the C-terminal 
sectkxi. 

Consensus pattern: [EQ]-x-IEQK]4LIVM](2)-x(2)-(UVM]-x(2)-[LI\^Y]-N-x-[DN]-x(5)-{LIVMF](3)-Q^ 
40 Consensus pattem: P-G-G-V-G-P-[MF]-T-IIV] 

[ 1] Shannon K.W.. Rabinowitz J.C. J. Bk)l. Chem. 263:771 7-7725(1 988).[ 2] Belanger C. Mackenzie R.E. J, Bbl. 
Chem. 264:4837-4843(1 989).( 3J d'Ari L, Rabinowitz J.C. J. Biol. Chem. 266:23953-23958(1991). 
[1521] 639. Trk)sephosphate isomerase active site (TIM) 

Triosephosphate isomerase (EC 5. 3. 1.1 ) (Tl M) [1 ] is the glycolytic enzyme that catalyzes the reversible interconversion 
45 of glycerakJehyde 3-phosphate and dJiydroxyacetone phosphate. TIM plays an important role in several metabolic 
pathways and is essential for efficient energy productk)n. It is a dimer of kJentical subunits, each of which is made up 
off about 250 amino-ackJ reskJues. A glutamc ackJ reskiue is involved in the catalytk: mechanism [2]. The sequence 
around the active site reskiue is perfectly conserved in all known TIM's and can be used as a signature pattem for this 
type of enzyme. 

so Consensus pattem: IAV]-Y-E-P-[UVM)-W-[SA]-I-G-T-(GK] [E is the active site reskiue] 

[ 1J Lolis E., Alber T. Davenport R.C„ Rose D., Hartman FC, Petsko G.A. Biochemistry 29:6609-6618(1 990).[ 2] 

Knowles J R. Nature 350:121-124(1991). 

[1522] 640. Thymidine kinase cellular-type signature (TK) 

Thymkiine kinase (TK) (EC 2.7.1.21) is an ubk^uitous enzyme that catalyzes the ATP-dependent phosphorylatkxi of 
ss thymkiine. A conrtparison <rf TK sequences has shown [1 ,2,3] that there are two different families of TK. One family 
groups together TK from herpes viruses as well as cellular thymidylate kinases, while the second family currently 
consists of TK from the foltowing sources: - Vertebrates. - Bacterial. - Bacterbphage T4. - Pox viruses. - Afrcan swine 
fever virus (ASF). - Fish lymphocystis disease virus (FLDV).A consented regk)n whfch is tocated in the C-temilnal 
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section of these enzymes has been selected as a signature pattern tor this family of TKA. 

Consensus pattern: [GA]-x(1,2HDE]-x-Y-x-[STAPJ-x-C4NKR)-x4CHHLIVMFYWH} [ 1] Boyle D.B., Coupar B.E.H.. 
Gibbs A.J., Seigman LJ., Both G.W. Virology 156:355-365(1 987). (2] Blasco R. Lopez-Olin C, Munoz M., Bockamp 
E.-O., Simon-Mateo C, Vinuela E. Virology 178:301 -304(1 990). [ 3] Ftobertson G.R., Whalley J.M. Nucleic Acids Res. 
s 16:11303-11317(1988). 

[1 523] 641 . Thymidine kinase from herpesvirus (TK herpes) 
[1] 

Medline: 96003730 

Crystal structures of the thymkline kinase from herpes simplex virus type-1 in complex with deoxythymidine and gan- 
10 cicbvir 

Brown DG. Visse R, Sandhu G, Davies A. Rizkallah PJ. Melitz 
C. Summers WC. Sanderson MR; 
Nat Struct B»i 1995;2:876-881. 
Numk)er of members: 65 

IS [1524] 642. Nuclear trar^ition protein 2 signatures (TP2) 

In mammals, the second stage of spermatogenesis is characterized by the conversion of nucleosomal chromatin to 
the compact, non-nucleosomal and transcriptnnally inactive form found in the sperm nucleus. This condensation is 
associated with a double-protein transition. The first transition corresponds to the replacement of histones by several 
spemnatid-specific proteins, also called transitk)n proteins, which are themselves replaced by protamines during the 

20 second transition. Nuclear transltk)n protein 2 (TP2) is one of those spemnatid-specific proteins. TP2 is a basic, zinc- 
binding protein [1] of 116 to 137 amino-acid residues. Structurally, TP2 consists of three distinct parts: a conserved 
serine-rich N-terminal domain of about 25 resklues, a variable central domain of 20 to 50 residues whk:h contains 
cysteine reskiues, and a conserved C-terminal domain of about 70 reskiues rich in lysines and arginines. Two signature 
patterns for TP2 have been devek)ped: one tocated in the N-termina! domain, the other in the C-terminal. 

^ Consensus pattern: H-x(3)-H-S-{NS]-S-x-P-OS 

Consensus pattern: K-x-R-K-x(2)-E-G-K-x(2)-K-[KR]-K 

[1J Baskaran R, Rao M.RS. Bbchem. Biophys. Res. Commun, 179:1491-1499(1991). 
[1525] 643. Thiamine pyrophosphate enzymes signature (TTP enzymes) 

A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cofactor It has been shown [1] that 
30 some of these enzymes are stmcturally related. These related TPP enzymes are: - Pyaivate oxkJase (POX) (EC 1.2.3.3) 
Reaction catalyzed: pyruvate + orthophosphate + 0(2) + H(2p = acetyl phosphate + CO(2) + H(2)0(2). - Pyruvate 
decarboxylase (PDC) fEC 4.1.1.1) Reactkxi catalyzed: pyruvate = acetaWehyde + CO(2). - Indolepyruvate decarbox- 
ylase (EC 4. 1.1. 74) [2] Reactk)n catalyzed: indoIe-3-pyruvate = indole-3-acetaldehyde + CO(2). - Acetolactate synthase 
(ALS) (EC 4.1.3.18) Reactton catalyzed: 2 pyruvate = acetolactate + CO(2). - Benzoylformate decarboxylase (BFD) 
35 (EC 4.1.1.7 ) [3] Reaction catalyzed: benzoylformate = benzakJehyde + CO(2). A consented region which is located in 
their C-terminal section has been selected as a signature pattern for these enzymes. 
Consensus pattern: [LIVMF]^GSA]-x(5)-P-x(4)-[LIVMFrWl-x-[LIVMF]-x-G-D-[GSAJ-[GSAC) 
[ 1] Green J.B.A. FEBS Lett 246:1 -5(1 989).( 2] Koga J., Adachi T, Hidaka H. Mol. Gen. Genet. 226:1 0-1 6(1 991 ).[ 3] 
Tsou A.Y., Ransom S.C.. Gerll J.A., Buechter D.D., Babbitt PC, Kenyon G.L Bkxjhemistry 29:9856-9862(1990) 
40 [i52q 644.TPRDomain 
[1] 

Medline: 95397415 

Tetratrkx) peptide repeat interactkms: to TPR or not to TPR? 
Lamb JR, Tugendreich S, Hteter P; 
45 Trends Bkx:harn Sci 1 995;20:257-259. 

[2]Medline: 98151343 

The structure of the tetratricopeptide repeats of protein phosphatase 5: implbations for TPR-mediated protein-protein 
interactnns. 

Das AK, Cohen PW. Barford D; 
so EMBO J 1998;17:1192-1199. 

Number of members: 621 

[1527] 645. Uroporphyrin-lll C-methyltransferase signatures (TP methylase) 

Uroporphyrin-lll C-m^hyflransferase (EC 2.1.1.107) (SUMT) [1 ,2] catalyzes the transfer of two methyl groups from S- 
adenosykL-methfonine to the C-2 and C-7atoms of uroporphyrinogen III to yieW precon'in-2 via the intermediate for- 
SS matkxi of precorrin-1 . 

SUMT is the first enzyme specific to the cobalamin pathway and precorrin-2 is a common intermediate in the biosyn- 
thesis of corrinoids such as vitamin B12. siroheme and coenzyme F430.The sequences of SUMT from a variety of 
eubacterial and archaebacterial species are currently available. In species such as Bacillus megaterium (gene cobA], 
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Pseudomonas denitrtficans (cobA) or Methanobacterium ivanovii (gene corA) SUMT is a protein of about 25 to 30 Kd. 
In Escherichia coli and related bacteria, the cysG protein, which Is involved In the biosynthesis of siroheme, is a mul- 
tifunctional protein composed of a N-terminal domain, probably Involved in transforming precorrin-2 into siroheme, and 
a C-temiinal domain which has SUMT activity. The sequence of SUMT Is related to that of a number of R denitrrficans 

s and Salmonella typhimurlum enzymes involved in the biosynthesis of oobalamin which also seem to be SAM-dependent 
methyltransferases [3,4]. The similarity Is especially strong with two of these enzymes: cobl/cblL which encodes S- 
adenosyl-L-methionlne-precorrin-2 methyltransferase and cobM/cbiF whose exact function is not known. Two signa- 
ture patterns have been developed for these enzymes. The first corresponds to a well conserved region in the N- 
terminal extremity (called region 1 in [1,3]) and the second to a less consented region located in the central part of 

10 these proteins (this pattern spans what are called regions 2 and 3 in [1 ,3]). 

Consensus pattern: (LIVM]-[GS]-[STAL]-G-P-G-x{3)-[UN^Fyi-[LIVMhT-[LIVM]-[KRHQGJ-[AG] 
Consensus pattern: V-x(2)-[Llhx(2)-G-D-x(3HFyVV]-[GS]-x(8HUN^-x(5,6HLI\^ 

[ 1] Blanche R, Flobin C, Couder M.. Faucher D., Cauchois L., Cameron B.. Crouzet J. J. Bacteriol. 173:4637-4645 
(1991).[ 2] Flobin C, Blanche F, Cauchois L, Cameron B., Couder M., Crouzet J. J. Bacteriol. 173:4893-4896(1991). 
IS [ 3] Crouzet J., Cameron B.. Cauchois L, Rlgault S., Rouyez M.-C.. Blanche F, Thibaut D., Debussche L. J. Bacteriol. 
172:5980-5990(1 990).[ 4] Ftoth J.R., Lawrence J.G.. Rubenfield M.. Kieffer-Higgins S.. Church G.M. J. Bacteriol 175: 
3303-3316(1993).[5) Mattheakis LC, Shen W.H., Collier R.J. Mol. Cell. Biol. 12:4026-4037(1992). 
[1528] 646. Tudor domain 

Domain of unknown function present in several RNA-binding proteins, copies in the Drosophila Tudor protein. Slight 
20 ambiguities in the alignment. Number of members: 18 

[1]Medline: 97200561 Tudor domains in proteins that interact with RNA. Ponting CP; Trends Bkx^hem Set 1997;22: 

51-52. [2]Medline: 97157029 The human EBNA-2 coactivator pi 00: multldomain organizaton and relationship to the 

staphylococcal nuclease fold and to the tudor protein involved in Drosophila melanogaster devebprrmnt. Callebaut I, 

Momon JP; Biochem J 1997;321:125-132. 
2S [1529] 647. Terpene synthase family 

It has been suggested that this gene family be designated tps (for terpene synthase) [1]. It has been split Into six 

subgroups on the basis of phyk>geny, called tpsa-tpsf. tpsa includes vetisplridiene synthase Swiss:Q39979, 5-epi- 

aristokxhene synthase, Swlss:Q40577 and (+)-delta-cadlnene synthase Swiss: P93665. 

tpsb Includes (-)-limonene synthase, Swiss:Q40322. 
30 tpsc includes kaurene synthase A, Swiss:004408. 

tpsd Includes taxadlene synthase, Swiss:Q41594, pinene synthase, 

Swiss:024475 and myrcene synthase, Swiss:024474. 

tpse includes kaurene synthase B. 

tpsf includes linalool synthase. 
3S Number of members: 51 
[1] 

Medline: 97413772 

Monolerpene synthases from grarKJ fir (Abies grandis). cDNA isolatkxi, characterization, and functional expression of 
myrcene synthase, (-)-(4S)-limonene synthase, and (-)-(1S,5S)-pinene synthase. 
40 Bohlmann J, Steele CL, Croteau R; 

J Bk)l Chem 1997;272:21784-21792. 
[1530] 648. ThlF family 

This ^ily contains a repeated domain in ubiqultin activating enzyme E1 and members of the bacterial 
ThiF/MoeB/HesA family.Number of members: 87 
45 [1531] 649. Thioester dehydrase 

Members of this family are involved in fatty acki bk>synthesis. 
Number of members: 19 
II] 

Medline: 96398612 

so Structure of a dehydratasensomerase from the bacterial pathway for biosynthesis of unsaturated fatty ackte: two cat- 
alytic activities in one active site. 
Leesong M. Henderson BS, Gilltg JR. Schwab JM, Smith JL; 
Structure 1996;4:253-264. 
Database Reference: SCOP; Imka; fa; [SCOP-USA] [CATH-PDBSUM] 
ss Database reference: PFAMB; PB058036; 
[1 532] 650. Tub family signatures 

The mouse tubby mutation Is the cause of maturity-onset obesity, insulin resistance and sensory deficits. This mutation 
maps to a gene, tub [1,2],which codes for a protein that belongs to a family whbh currently consists of the folk>wing 
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members: - Mammalian tub, an hydrophilic protein of about 500 residues, which could be involved in the hypothalamic 
regulation of body weight. - Human protein TULP1 [3] which may be involved in retinis pigmentosa 14. a retinal de- 
generation disease. - Mouse protein p4-6 whose function is not known. - Caenorhabditis elegans hypothetical protein 
F10B5.4. - Several fragmentary sequences from plants, Drosophila and human ESTs. While the N-terminal part of 
s these protein is not conserved in length nor in the sequence, the C-terminal 250 residues are highly consented. There- 
fore, two regions were selected in the C-terminal part as signature patterns. The secondr egion is located at the C- 
terminal extremity and contains a penultimate cysteine residue that could be critical to the nomial functioning of these 
proteins. 

Consensus pattern: F-[KHQ]-G-R-V-[ST]-x-A-S-V-K-N-F-Q 

10 Consensus pattern: A-F-[AG]-l4SACHUVM]-[ST]-S-F-x-{GST]-K-x-A-C-E 

[ 1] Kleyn P.W., Fan W.. Kovats S.G.. Lee J.L. Pulido J.C.. Wu Y., Berkemeier LR., Misumi D.J.. Holmgren L. Charlat 
O.. Woolf E.A.. Tayber O., Brody T, Shu P. Hawkins F.. Kennedy B.. Baldini L. Ebeling C, Alperin G.D.. Deeds J.. 
Lakey N.D.. Culpepper J., Chen K, Gluecksmann-Kus M.A., Carlson G.A., Duyk G.M., Moore K.J. Cell 85:281 -290 
(1 996).[ 2] IMoben-Trauth K.. Naggert J.K.. North M.A., Nishina PM. Nature 380:534-538(1 996).[ 3] North MA. Naggert 

IS J.K., Yan Y. Noben-Trauth K., Nishina PM. Proc. Natl. Acad. Sci. U.S.A. 94:3128-3133(1997). 
[1 533] 651 . Eukaryotic DNA topoisomerase I active site 

DNA topoisomerase I (EC 5.99.1.2) [1 .2.3.4.E11 Is one of the tv«> types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type Itopoisomerases act by catalyzing the transient breakage of DNA, one strand at a 
time, and the subsequent rejoining of the strands. When a eukaryotic type Itopoisomerase breaks a DNA backbone 

20 bond, it simultaneously forms a protein-DNA link where the hydroxyl group of a tyrosine residue is joined to a 3*- 
phosphate on DNA, at one end of the enzyme-severed DNA strand. In eukaryotes and pox virus topoisonrterases I, 
there are a number of conserved reskiues in the regkm around the active site tyrosine. 
Consensus pattern: [DENI-x(6)-IGS]-IITl-S-K-x(2)-Y-[LI VMl-x(3)-ILI VMJ [Y is the active site tyrosine] 
[ 1] Sternglanz R. Curr. OpIn. Cell Biol. 1:533-535(1 990).[2] Sharma A., Mondragon A. Cun-. Opin. Struct. Biol. 5:39-47 

25 (1995).[3] Lynn R.M.. Bjornsti M.-A., Caron PR.. V\teng J.C. Proc. Natl. Acad. Scl. U.S.A. 86:3559-3563(1 989). [4] 
Roca J. Trends Bk)chem. Sci. 20: 156-1 60(1 995).[E1] 
[1534] 652. Transakiolase signatures 

Transaktolase (EC 2.2.1.2 ) catalyzes the reversible transfer of a three-carbonketol unit from sedoheptutose 7-phos- 
phate to glyceraWehyde 3-phosphate to fonm erythrose 4-phosphate and fructose 6-phosphate. This enzyme, together 

30 with transketolase, provides a link between the glycolytic and pentose-phosphate pathways. Transaldolase is an en- 
zyme of about 34 Kd whose sequence has been well consented throughout evolutbn. A lysine has been implicated 
[1]in the catalytic mechanism of the enzyme; it acts as a nucleophilk: group that attacks the carbonyl group of fructose- 
6-phosphate.Transaktolase is evolutionary related [2] to a bacterial protein of about 20Kd (known as talC in Escherichia 
coli), whose exact functkxi Is not yet known. Two signature patterns have been developed for these proteins. The first, 

3S located in the N-terminal sectkm, contains a perfectly consented pentapeptkie; these cond. includes the active site 
lysine. 

Consensus pattern: pGHIVSAl-T-IST>N-P-tSTAHLIVMF](2) 

Consensus pattem: IUVMl-x-[UVMHK4UVMHPAShx-[ST]-x-IDENQPAS]-G-[LIVMl-x-[AGVhx-[Q^ 
[K is the active site reskiue] 

40 [ 1] Miosga T, Schaaff-Gerstenschlaeger I.. Franken E., Zimmemnann F.K. Yeast 9: 1241-1 249(1 993).[ 2] Reizer J., 
Reizer A., Saier M.H. Jr. Mk:robk>k)gy 141:961-971(1995). 
[1 535] 653. (Transpeptkiase) Penk:illin binding protein transpeptklase domain 
[1536] The active site serine (residue 337 in Swiss:P14677) is consented in all members of this family. 
[1537] [1] Pares S, Mouz N, Petillol Y, Hakenbeck R, Dkieberg O Nat Slmct Btol 1996;3:284-289. 

^ [1538] 654. Trehatase signatures 

Trehalase (EC 3.2.1.28 ) is the enzyme responsible for the degradatbn of the disaccharide alpha, alpha-trehatose 
yiekiing two glucose subunits [1]. It is an enzyme found in a wkfe variety of organisms and whose sequence has been 
highly consented throughout evolution. Two of the most highly consen/ed regkxis have been selected as signature 
patterns. The first pattem is located in the central section, the second one is in the C4ermlnal regkxi. Consensus 

so pattern: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 

Consensus pattem: Q-W-D-x-P-x-IGA]-W-{PAS]-P 

[ 1] Kopp M., Mueller H., Hotzer H. J. Bk>l. Chem. 268:4766-4774(1 993). [ 2] Henrissat B.. Bairoch A. Bkxhem. J. 293: 
781-788(1993).[E11 

[1539] 655. Trehak)se-6-phosphate synthase domain 
55 [1 540] OtsA (Trehalose-6-phosphate synthase) is homologous to regions in the subunits of yeast trehalose-6-phos- 
phate synttiase/jphosphate complex. [1). 

[1541] [1] Kaasen I, McDougallJ. Strom AR; Gene 1994:145:9-15. 
[1 542] 656. Tropomyosins signature 
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Tropomyosins [1,2] are family of closely related proteins present in muscle and non-muscle cells. In striated muscle, 
tropomyosin mediate the interactions between the troponin complex and actin so as to regulate muscle contraction. 
The role of tropomyosin in smooth muscle and non-muscle tissues is not clear Tropomyosin is an alpha-helical protein 
that forms a coiled-coil dimer. Muscle isofonms of tropomyosin are characterized by having 284 amino acid residues 
5 and a highly conserved N-terminal region, whereas non-muscle forms are generally smaller and are heterogeneous 
in their N-terminal region. The signature pattern for tropomyosins is based on a very conserved region in the C^erminal 
section of tropomyosins and which is present in both muscle and ruxi-muscle forms. 
Consensus pattern: L-K-E-A-E-x-R-A-E 

[ 1] Smilie LB. Trends Biochem. Sci. 4:1 51-1 55(1 979).( 2] McLeod A.R BioEssays 6:208-212(1986). 
10 [1543] 657. Troponin 

Troponin (Tn) contains three subunits, Ca2+ binding (TnC), inhibitory (Tnl), and tropomyosin binding (TnT). this Pfam 
contains members of the TnT subunit. 

Troponin is a complex of three proteins, Ca2+ binding (TnC), inhibitory (Tnl), and tropomyosin binding (TnT). 
The troponin complex regulates Ca-M- induced muscle contraction. 
IS This family includes troponin T and troponin I. Troponin I binds to actin and troponin T birKis to tropomyosin. 
Number of members: 81 [1] 
Medline: 87144593 

Structure of co-crystals of tropomyosin and troponin. 
While SP, CJohen C, Phillips GN Jr; 
20 Nature 1 987;325:82&82B. [2] 
Medline: 95155315 

A direct regulatory role for troponin T and a dual role for 

troponin C in the Ca2-i' regulation of muscle contraction. 

Potter JD, Sheng Z, Pan BS, Zhao J; 
2S J Biol Chem 1 995;270: 2557-2562. 

[3]Medline: 95324796 

The troponin complex and regulation of muscle contraction. 

Farah CS. Reinach FC; 

FASEB J 1995;9:755-767. 
30 [1544] 658. (Tryp mucin) Mucin-like glycoprotein 

[1545] This family of trypanosomal proteins resemble vertebrate mucins. The protein consists of three regions. The 

N and C tenmrnti are consen/ed between all members of the family, whereas the central region is not well conserved 

and contains a large number of threonine residues which can be glycosylated [1]. 

Indirect evidence suggested that these genes might encode the core protein of parasite mucins, glycoproteins that 
35 were proposed to be involved in the interaction with, and invasion of, mammalian host cells. 

[11 Di Noia JM, Sanchez DO. Frasch AC; J Biol Chem 1995;270:24146-24149. 

[2] Di Noia JM, D*Orso I. Aslund U Sanchez DO, Frasch AC; J Biol Chem 1998;273:10843-10850. 

40 [15461 659. Aminoacykransfer RNA synthetases class-l signature (tRNA synt 1 ) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types <rf amtnoacyl-tRNA synthetases, one for each diffferentamino acid. In eukaryotes there are generally 
two aminoacyt-tRNA synthetases for each different amino acid one cytosolic form and a mitochondrial form. While all 

4S these enzyme have a common function, they are widely diverse intenms of subunit size and of quatemary structure. 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N^erminal 
section, in particular the consensus tetrapeptide His-lle-Gly-His ('HIGH) is very well conserved. The 'HIGH' region has 
been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl4RNA 
synth^ases specific for arginine, cysteine, glutamic acid, glutamine. isoleucine, leucine, methionine, tyrosine, tryp- 

50 tophan, and valine. These aminoacyl-tRNA synthetases are refen-ed to as class-l synthetases [4,5,6] and seem to 
share the same tertiarystoicture based on a Rossmann fold. Consensus pattern: P-x(0,2)-(GSTAN]-[DENQGAPK}-x- 
[UVMFPHHTHUVMYAC]-G-[HNTG]-[LIVMFYSTAGPC] 

[ 1] Schimmel P Annu. Rev. Biochem. 56: 125-158(1 987).[ 2] Webster T. Tsai H.. Kula M., Mackie G.A.. Schimmel P 
Science 226:1315-1317(1984).( 3J Brick P, Bhat TN., Btow D.M. J. Mol. Bk)L 208:83-98(1 98a).[ 4] Delarue M.. Moras 
SS D. BkjEssays 15:675-687(1 993). [ 5] Schimmel P Trends Biochem. Sci. 16:1 -3(1 991 ).[ 6] Nagel G.M.. Doolittle R.R 
Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 

[1547] 660. Aminoacykransfer RNA synthetases class-l signature (IRNA synt 1 b) 

AmInoacyMRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino ackis and transfer them 
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to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisnns there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure. 

5 A few years ago it was found [2J that several aminoacy MRNA synthetases share a region of similarity in their l^-terminal 
section, in particular the consensus tetrapeptide His-lle-Gly-His ('HIGH') is very well conserved. The 'HIGH'region has 
been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyMRNA 
synthetases specific forarginine, cysteine, glutamic acid, glutamlne, isoleucine, leucine, methionine, tyrosine, tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4,5,6] and seem to 

10 share the same tertiary structure based on a Rossmann fold. Consensus pattem: P-x(0,2)-[GSTANHOENQGAPK]-x- 
[UVI^FPh[HTl-tLIVMYACJ-G-{HNTGHUVMFYSTAGPC 

[ 1] Schimmel P Annu. Rev. Biochem. 56:125-158(1987).[2] Webster T, Tsai H.. Kula f^., Mackie G.A.. Schimmel R 
Science 226:1 31 5-1 31 7(1 984). ( 3] Brck R. Bhat TN., Blow D M. J. Mol. Bk)l. 208:83-98(1 988). [ 4] Delarue M.. Moras 
D. BioEssays 15:675-687(1 993). [5] Schimmel R Trends Biochem. Sci. 16:1-3(1991).(6] Nagel G.M., Doolittle R.R 
IS Proc. Natl. Acad. Sci. U.S.A. 88:81 21 -81 25(1 991 ). 

[1548] 661. (tRNA^ynt 1C) tRNA synthetases class I (E and Q) 

[1 549] Other tRNA synthetase sub-families are too dissimilar to be included. 

This family includes only glutamyl and glutaminyl tRNA synthetases. 

In some organisms, a single glutamyMRNA synthetase aminoacylates both tRNA(Gtu) and tRNA(Gln). 
20 [1550] [1] Rath Vl_ Silvian LF, Beijer B. Sproat BS, Steite TA; Structure 1998;6:439-449. 
[1551] 662. (tRNA-synt Id) tRNA synthetases class I (R) 
[1 552] Other tRNA synthetase sub-families are too dissimilar to be included. 
This family includes only arginyl tRNA synthetase. 

[1553] 663. Aminoacyl-transfer RNA synthetases class-ll signatures (tRNA synt 2) 

2S Aminoacyl-tRNA synthetases (EC 6.1 . 1 .-) [1] are a group of enzymes which activate amino ackis and transfer them 
to specifk: tRNA molecules as the first step in protein biosynthesis. In prokaryotk: organisms there are at least twenty 
different types of aminoacyMRNA synthetases, one for each different amino ackd. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are wkiely diverse interms of subunit size and of quaternary structure. 

30 The synthetases specific for alanine, asparagine, aspartc ackJ, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as ctass-ll synthetases [2 to 6] and probably have a common folding pattem in their 
catalytk; donnain for the binding of ATP and amino acid whk:h is different to the Rossmann fold obsen^ed for the class 
I synthetases p^. Class-l I tRNA synthetases do not share a high degree of similarity, however at least three consented 
regions are present [2,5,8]. Signature patterns have been derived from two of these regions. 

3S Consensus pattem: [F YH]-R-x-[DE]-x(4, 1 2)-[RH]-x(3)-F-x(3)-{DE 

Consensus pattem: [GSTALVFh{DElNlQHRKPHGSTAHLIVMF]-[DE]-R4LIVMFhx-[LIVI^TAG]4LI^ 
[ 1] Schimmel P Annu. Rev. Bkxhem. 56: 125-1 58(1 987).[ 2] Delanje M., Moras D. BfoEssays 15:675-687(1 993). [ 3] 
Schimmel P Trends Bkx:hem. Sci. 16: 1-3(1 991 ).( 4] fNJagel G.M., Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88: 
8121-8125(1991). [ 5] Cusack S., Haertlein M.. Leberman R. Nucleic AckJs Res. 19:3489-3498(1 991). [6] Cusack S. 

40 Bkxjhimie 75: 1077-1 081 (1993).( 7] Cusack S., Berthet-Cotominas C. Haertlein M., Nassar N., Leberman R. Nature 
347:249-255(1 990),[ 8] Leveque R, Plateau R. Dessen P, Blanquet S. Nuciek: Acids Res. 18:305-312(1990). 
[1 554] 664. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1 e) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino ackis and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotk; organisms there are at least twenty 

45 different types of aminoacyl-tRNA synthetases, one for each different amino ackJ. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While ail 
these enzymes have a conrunon functkm, they are wkiely diverse in terms of subunit size and of quaternary structure. 
A few years ago it was found [2] that several aminoacyMRNA synthetases share a region of similarity in their N-terminal 
sectkx), in partfcular the consensus tetrapeptkle His-lle-Gly-His ("HIGH') is very well consented. The 'HIGH' regran has 

so been shown [3] to be part of the adenylate binding site. The 'HIGH' signature has been found in the aminoacyl-tRNA 
synthetases specific forarginine, cysteine, glutamk: acid, glutamine, isoleucine, leucine, methk)nine, tyrosine, tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are refen-ed to as class-l synthetases [4,5,6] and seem to 
share the same tertiary structure based on a Rossmann fold. 

Consensus pattern: P-x(0,2)-[GSTANHDENQGAPK]-x-[LIVMFPHHT]-[LlVMYAC]-G-[HNTG]-[LIVMFYSTAGPC 
ss [ 1] Schimmel R Annu. Rev. Bkxhem. 56: 125-158(1 987). [ 2] Webster T, Tsai H.. Kula M.. Mackie G.A., Schimmel P 
Science 226:1315-1317(1984).[ 3] Brick P. Bhat T.N.. Blow D.M. J. Mol. Btol. 208:83-98(1 988).[ 4] Delame M., Moras 
D. BbEssays 1 5:675-687(1 993).[ 5] Schimmel R Trends Bk)chem. Sci. 16:1-3(1991).[6J Nagel G.M.. Doolittle R.F. 
Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
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[1555] 665. Aminoacyl-transfer RNA synthetases ciass-il signatures (tRNA synt 2b) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisnris there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 

s two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic fonri and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse interms of subunit size and of quatemary structure. 
The synthetases specific for alanine, asparagine. aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-ll synthetases (2 to 6] and probably have a common folding pattern in their 
catalytic donrrain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the class 

10 I synthetases [7].CIass-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved 
regions are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYHJ-R-x4DE]-x(4,12)4RH]-x(3)-F-x(3)-{DE 

Consensus pattern: [GSTALVF]4DENQHRKPHGSTAJ-[LIVMn4DE]-R4LI\^F>x-[LIVI^TAG]4LI^ 

[ 1] Schimmel P. Annu. Rev. Biochem. 56: 125-1 58(1 987). [2] Delarue M., Moras D. BioEssays 1 5:675-687(1 993).( 3) 

IS Schimmel P Trends Biochem. Sci. 16:1-3(1991).[ 4] Nagel G.M., Doolittle R.R Proc. Natl. Acad. Sci. U.S.A. 88; 
8121.8125(1991). [ 5] Cusack S., Haertlein M., Leberman R. Nuciek: Ackte Res. 19:3489-3498(1 991 ).[ 6] Cusack S. 
Bk)chinnie 75:1077-1081(1993).[ 7] Cusack S.. Berlhet-Cotominas C. Haertlein M., fMassar N., Lebemian R. Nature 
347:249-255(1 990).[ 8] Leveque R, Plateau P.. Dessen R, Blanquet S. Nuciek: Adds Res. 18:305-312(1990). 
[1556] 666. Thaumatin family signature 

20 Thaumatin [1] is an intensively sweet-tasting protein (100 000 times sweeter than sucrose on a molar basis) from 
Thaumatococcus daniellii, an African brush. The protein is made of about 200 residues and contains 8 disulfide bonds. 
A number of proteins have been found to be related to thaumatins. These protein are listed below (references are only 
provided for recently determined sequences). - A maize alpha-amytase/trypsin inhibitor. - Two tobacco pathogenesis- 
related proteins: PR-R major and minor fonns, whk:h are induced after infectk)n with vimses. - Salt-induced protein 

25 NP24 from tomato. - Osmotin, a salt-induced protein from tobacco. - Osmotin-like proteins OSML13, OSML15 and 
OSMLBI from potato [2J. - P21 , a leaf protein from soybean. - PWIR2. a leaf protein from wheat. - Zeamatin. a maize 
antifunal protein [3].The exact bbbgk^al f unctkxi of all these proteins is not yet known. A consented regkxi that includes 
three cysteine reskJues known (in thaumatin) to be involved in disulfide bonds has been selected as a signature pattern. 
H ^1+ + II III 

-I — 1 + ^*C': consented cysteine involved in a disulfide bond.'**: position of the pattern. 

Consensus pattern: G-x-[GFhx-C-x-T-[GAJ-D-C-x(1,2)-G-x(2,3)-C 

[ 1] Edens L. Heslinga L, Ktok R.. Ledeboer AM.. Maat J., Toonen M Y. Visser C, Verrips C.T Gene 18:1-12(1 982). 
[2] Zhu B., Chen T.H.H., U RH. Plant Physbl. 108:929-937(1 995).[ 3] Malehom D.E., Borgmeyer J R.. Smith C.E.. 
^ Shah D M.; Plant Physiol. 106:1471-1481(1994). 
[1557] 667. Thblases signatures 

Two different types of thiolase [1 ,2,3] are found both in eukaryc^es and in prokaryotes: acetoecetyl-CoA thiolase (EC 
2.3.1.9) and 3-ketoacyl-CoA thk>lase(EC 2.3.1.16) . 3-ketoacyl-CoA thk>lase (also called thk>lase I) has a broad chain- 
length specificity for its substrates and Is involved in degradative pathways such as fatty acid beta-oxidation. Ace- 

40 toacetyl-CoA thiolase (also called thiolase II) is specific for the thiolysis of acetoacetyl-CoA and involved in biosynthetic 
pathways such as poly beta-hydroxybutyrate synthesisor steroki biogenesis. In eukaryotes, there are two forms of 
3-ketoacyl-CoA thiolase: one kxated in the mitochorKirbn and the other in peroxisomes. There are two conserved 
cysteine residues important for thk>lase activity. The first located in the N-tenminal section of the enzymes is involved 
in the formation of an acyl-enzyme intermediate; the second kx:ated at the C-tenminal extremity is the active site base 

45 involved in deprotonatkxi in the condensation reaction. Mammalian nonspecific lipki-transfer protein (nsL-TP) (also 
known as sterol earner protein 2) is a protein whch seems to exist in two different forms: a 14 Kd protein (SCP-2) and 
a larger 58 Kd protein (SCP-x). The former is found In the cytoplasm or the mitochondria and is Invofved in lipid transport; 
the latter is found in peroxisomes. 

The C-tenmtnal part of SCP-x is kientical to SCP-2 while the N-terminal portkxi is evolutionary related to thiolases[4]. 
so Three signature pattems have been devetoped for the family of proteins, two of whrch are based on the regbns around 
the bbtogkally important cysteines. The third is based on a highly conserved region In the C-terminal part of these 
proteins. 

Consensus pattern: [UVM]-{NST|-x(2)-C-[SAGLI]-[ST]-[SAG]-[LIVMFYNS]-x-[STAG]-[LIVM]-x(6)-{LIVM] [C is involved 
in formation of acyt-enzyme intermediate] 
ss Consensus pattern: N-x(2)-G-G-x-{LI VM]-[SA]-x-G-H-P-x-[GA]-x-[ST]-G 

Consensus pattern: [AGHLI VMA]-[STAGCLI VM]-{STAGJ-[LI VMA]-C-x-[AG]-x-[AG]-x- [AG]-x-[SAG] [C is the active site 
residue] 

[ 1] Peoples O.P.. Sinskey AJ. J. Bk>l. Chem. 264: 15293-1 5297(1 989).[ 2] Yang S.-Y, Yang X-YH.. Healy-Louie G., 
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Schufe H.. Ebinga M. J. Biol. Chem. 265: 10424-1 0429(1 990).[ 3] Igual J.C., Gonzalez-Bosch C. Dopazo J., Perez- 
Ortln J.E. J. Moi. Evol. 35: 147-1 55(1 992),[ 4] Baker M.E.. Blllheimer J.T. Strauss J.R III DNA Cell BioL 10:695-698 
(1991). 

[1558] 668. Thioredoxin family active site 

5 Thioredoxins [1 to 4] are small proteins of approximately one hundred amina«cid residues which participate In various 
redox reactbns via the reversible oxidation of an active center disulfide bond. They exist in either a reduced form or 
an oxidized form where the two cysteine residues are linked in an intranrralecular disuffide bond. Thioredoxin is present 
in prokaryotes and eukaryotes and the sequence around the redox-active disulfide bond is welteonsen^ed. Bacteri- 
ophage T4 also encodes for a thioredoxin but its primary structure is not honrwlogous to bacterial, plant and vertebrate 

10 thioredoxins. A number of eukaryotic proteins contain domains evolutionary related tothtoredoxin, alt of them seem to 
be protein disulphkie isomerases (PDI). PDKEC 5.3.4.1) [5.6,7] is an endoplasmw reticulum enzyme that catalyzes 
the rearrangement of disulfkie bonds in vark>us proteins. The various forms of PDI which are currently known are: - 
PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase (EC 
1.14.11.2 ). as a component of oligosaccharyl transferase (EC 2.4.1.119) . as thyroxine dekxJinase (EC 3.8. 1.4), as 

IS glutathione-insulin transhydrogenase (EC I B.4.2) and as a thyroid hormone-binding protein I - ERp60 (ER-60; 58 Kd 
microsomal protein). ERp60 was originally thought to be a phospholnositide-speclfic phospholipase C isozyme and 
later to be a protease. - ERp72. - P5.AII PDI contains two or three (ERp72) copies of the thioredoxin domain. Bacterial 
proteins that act as thk>l:disulfide interchange proteins ttiatalk>ws disulfide bond formatwn in some periplasmic proteins 
also contain a thioredoxin domain. These proteins are: - Escherichia coli dsbA (or prf A) and its orthotogs in Vibrio 

20 cholerae (tcpG) and Haemophilus influenzae (por). - Escherichia coli dsbC (or xpRA) and its orthologs in Erwinia 
chrysanthemi and Haemophilus influenzae. - Escherichia coli dsbD (or dipZ) and its Haemophilus influenzae orthok)g. 
- Escherichia coll dsbE (or ccmG) and orthok)gs in Haemophilus influenzae, Rhodobacter capsulatus (helX), Rhizio- 
biacae (cycY and tlpA). Consensus pattern: [LI VMF]-[LIVMSTA]-x-[LI VMFYC]-[FYWSTHE]-x(2)- [FYWGTN]-C- [GAT- 
PLVE]-(PHYWSTA]-C-x(6)-[LIVMFYWT] [The two as iorm the redox-active bond] 

2S [ 1] Holmgren A. Annu. Rev. Bkxjhem. 54:237-271 (1 985). [ 2] Gleason RK.. Holmgren A. FEMS Microbiol. Rev. 54: 
271-297(1988).[3] Holmgren A. J. Bk)l. Chem. 264: 13963-1 3966(1 989).[ 4J Ekiund K. Gleason F.K., Holmgren A. 
Proteins 11 :13-28(1991).[ 5] Freedman R.B.. Hawkins H.C., MurantS.J., RekJ L Bkxihem. Soc. Trans. 16:96-99(1988). 
[ 6] Kivirikko K.I.. Myllyla R., Pihiajaniemi T FASEB J. 3: 1609-161 7(1 989).[ 7] Freedman R.B., Hirst TR., Tuite M.F. 
Trends Bkx:hem. Sci. 19:331-336(1994). 

30 [1559] 669. (Transcript fac2) Transcription factor TFIIB repeat signature 

In eukaryotes the initiatkxi of transcription of protein encoding genes by polymerase It is modulated by general and 
specific transcription factors. The general transcriptbn factors operate through common promoters elements (such as 
the TATA box). At least seven different proteins associates to form the general transcription factors: TFIIA, -IIB, -IID, - 
HE, -IIF, -IIG, and -IIH[1].Transcriptk)n factor IIB (TFIIB) plays a central role In the transcription of class It genes, it 

3S associates with a complex of TFIID-IIA bound to DNA (DA complex) to form a ternary complex TFItD-tlA-IBB (DAB 
complex) which is then recognized by RNA polymerase II [2,3]. TFIIB is a protein of about 315 to 340amino acid 
residues which contains, in its C-terminal part an imperfect repeat of a domain of about 75 reskiues. This repeat couki 
contribute an element of symmetry to the folded protein. The folk>wing proteins have been shown to be evolutionary 
related to TFIIB: - An archaebacterial TFIIB homolog. In Pyrococcus woesei a prevk>usly undetected open reading 

40 frame has been shown [4] to be highly related to TFIIB, - Fungal transcription factor lltB 70 Kd subunit (gene 
PCF4/TDS4/BRF 1 ) [5]. This protein is a general activator of RNA polymerase III transcription and plays a role anatogous 
to that of TFIIB In pol III transcriptkxi. The central sectbn of the repeated domain, whfch is the most consented part of 
that domain has been selected as a signature pattern. 

Consensus pattern: G-[KR]-x(3)- (STAGN]-x-[LIVMYA]-{GSTA](2)-[CSAV]-[LI VM]-[LI VI^FY]-[LI VMA]-[GSA]-[STAC 
45 [ 1] Weinmann R. Gene Expr. 2:81-91(1992).[ 2] Hawley D. Trends Bkx:hem. Sci. 16:317-318(1 99 1).[ 3] Ha I.. Lane 
W.S., Reinberg D. Nature 352:689-695(1 991 ).[ 4] Ouzounis C. Sander C. Cell 71:1 89-1 90(1 992).f 5] Khoo B., Brophy 
B., Jackson S.P. Genes Dev. 8:2879-2890(1994). 
[1560] 670. (transcritp fact) MADS-box domain signature and profile 

A numt>er of transcriptkxi factors contain a conserved domain of 56 amino-acid residues, sometimes known as the 
so MADS-box domain [El]. They are listed betow: - Scrum response factor (SRF) [1], a mammalian transcription factor 
that binds to the Serum Response Element (SRE). This is a short sequence of dyad symmetry kxated 300 bp to the 
5'end of the transcriptkxi initiat»n site of genes such as c4os. - Mammalian myocyte-specific enhancer factors 2A to 
2D (MEF2A to MEF2D). These proteins are transcriptbn factor which binds specifically to the MEF2 element present 
in the regulatory reports of many muscle-specifk: genes. - Drosophila myocyte-specific enhancer factor 2 (MEF2). - 
ss Yeast GRM/PRTF protein (gene MCM1 ) [2], a transcriptional regulator of mating-type-specific genes. - Yeast arginine 
metabolism regulatbn protein I (gene ARGR1 or ARG80). - Yeast transcription factor RLM1. - Yeast transcriptfon factor 
SMP1. - ArabkJopsis thaliana agamous protein (AG) [3], a probable transcription factor involved in regulating genes 
that detemnines stamen and carpel development in wi W-type ftowers. Mutatkxis in the AG gene result in the replacement 
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of the stamens by petals and the carpels by a new flower. - Arabidopsls thaliana homeotic proteins Apetalal (API). 
ApetalaS (APS) and Pistillata (PI) which act locally to specify the identity of the floral meristem and to determine sepal 
and petal development [4]. - Antirrhinum majus and tobacco homeotic protein deficlens (DEFA) and gtobosa (GLO) 
[51. Both proteins are transcription factors involved In the genetic control of flower development. Mutations in DEFA or 
GLO cause the transformation of petals into sepals and of stamina into carpels. - Arabidopsls thaliana putative tran- 
scription factors AGL1 to AGL6 [6]. - Antirrhinum majus morphogenetic protein DEF H33 (squarnosa).ln SRF. the 
conserved domain has been shown [1] to be involved in DNA-binding and dimerization. A pattern that spans the com- 
plete length of the domain has been derived. The profile also spans the length of the MADS-box. 
Consensus pattem: R-x-[RK]-x(5)-l-x-[DNGSK]-x(3)-[KR]-x(2)-T-{FY}-x-[RK](3)- x(2)-[LIVMhx-K(2)-A-x-E-[LIVM]- 
ISTA]-x-L-x(4)-[LIVM]-x-[LIVM](3)-x(6)-(LIVMF]-x(2)-{FY] 

[ 11 fMorman C. Runswick M.. Pollock R., Treisman R. Cell 55:989-1003(1 988Vf 21 Passmore S.. Maine G.T., Elble R., 
Christ C. Tye B.-K, J. Mol. Biol. 204:593-606(1 988).[ 31 Yanofsky M., Ma H.. Bowman J.. Drews G.. Feldmann K.A.! 
Meyerowite E.M. IMature 346:35-39(1 990).[ 4] Goto K., Meyerowitz E M. Genes Dev 8: 1548-1 560(1 994). [ 5] Troebner 
W., Ramirez L. Motte P. Hue I.. Huijser P. Locnnig W.-E., Saedler H., Sommer H., Schwartz-Sommer Z. EMBO J. 
11:4693-4704(1992).[ 6] Ma K. Yanofsky M.F., Meyerowitz E.M. Genes Dev. 5:484-495(1991 ).[E11 
[1561] 671 . Transketoiase signatures 

Transketolase (EC 2.2.1.1) (TK) catalyzes the reversible transfer of a two^rbon ketol unit from xylutose 5-phosphate 
to an aklose receptor, such as ribose 5-phosphate, to form sedoheptutose 7-phosphate arKi glyceraldehyde 3-phos- 
phate. This enzyme, together with transaWolase, provides a link between the glycolytic and pentose-phosphate path- 
ways. TK requires thiamin pyrophosphate as a cofactor. In most sources where TK has been purified, It is a homodimer 
of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic and prokaryotc sources (1,21 show that 
the enzyme has been evoluttonarily consented. In the peroxisomes of methylotrophk: yeast Hansenula polymorpha, 
there is a highly related enzyme, dihydroxynacetone synthase (DHAS) (EC 2.2.1.3 ) (also known as fomialdehyde tran- 
sketolase). which exhibits a very unusual specificity by including formakJehyde amongst its substrates. 1-deoxyxylu- 
tose-5-phosphate synthase (DXP synthase) [3] is an enzyme so tar found In bacteria (gene dxs) and plants (gene 
CLA1) which catalyzes the thiamin pyrophosphoate^tependent acytoin condensatton reactton between carbon atoms 
2 and 3 of pyruvate and glyceraWehyde 3-phosphate to yield 1-deoxy-D- xyluk)se-5-phosphate (dxp), a precursor in 
the bk)synthetk; pathway to isoprenokte. thiamin (vitamin B1 ), and pyrktoxol (vitamin B6). DXP synthase is evolutionary 
related to TK. Two regbns of TK have been selected as signature patterns. The first, located in the N-tenninal sectran, 
contains a histkJine reskJue which appears to function inproton transfer during catalysis [4]. The second, kx^ated in the 
central sectbn, contains consented acidic residues that are part of the active cleft and may participate in substrate- 
binding [4]. 

Consensus pattem: R-x(3)-[LIVMTAHDENQSTHKF]-x(5.6)-[GSN]-G-H-[PLI VMF]-[GSTA]-x(2)-[LI MC]-[GS 
Consensus pattem: G-[DEQGSA]-[DN]-G-[PAEQ]-[STI-[HQ]-x-[PAGM]-[LIVMYAC]-[DEFYWhx(2)-[STA^^ 
[ 11 Abedlnia M., Layfield R., Jones S.M., Nixon PF. Mattk:k J.S. Blochem. Biophys. Res. Commun. 183:1159-1166 
(1992).[21 Fletcher TS., Kwee I.L. Nakada T, Largman C. Martin B.M. Bkjchemistry 31:1892-1896(1 992). [3] 
Sprenger G.A., Schortien U., Wiegert T, Grolle S.. De Graaf A.A., Taylor S.V., Begley TP. Bringer-Meyer S.. Sahm 
H. Proc. Natl. A cad. Sci. U.S.A. 94: 12857-12862(1 997V 141 Undqvisl Y. Schnekter G., Ermler U. . Sundstroem M. EMBO 
J. 11:2373-2379(1992). 

[1562] 672. Transmembrane 4 family signature 

Recently a number of eukaryotic cell surface antigens have been found to be evolutbnary related [1 ,2.31. The proteins 
known to bekxig to th is family are listed betow: - Mammalian antigen CD9 (MIC3); A protein involved in platelet activatk)n 
and aggregatbn. - Mammalian leukocyte antigen CD37. expressed on B lymphocytes. - Mammalian leukocyte antigen 
CD53 (OX-44), whfch may be involved in growth regulatwn in hematopoietk: cells. - Mamnrralian lysosomal membrane 
protein CD63 (melanoma-associated antigen ME491; antigen AD1). - Mammalian antigen CD81 (cell surface protein 
TAPA-1), whfch may play an important role in the regulatbn of lymphoma cell growth. - Mammalian antigen CD82 
(protein R2; antigen C33; Kangai 1 (KAI1)), which associates with CD4 or CDS and delivers costimulatory signals for 
the TCR/CD3 pathway. - Mammalian antigen CD151 (SFA-I; platelet-endothelial tetraspan antigen 3 (PETA-3)). - 
Mammalian cell surface glycoprotein A15 (TALLA-1; MXS1). - Mammalian novel anfigen 2 (NAG-2). - Human tumor- 
associated antigen CO-029, - Schistosoma mansoni and japonicum 23 Kd surface antigen (SM23 / SJ23).These pro- 
teins share the foltowing characteristics: th^ all seem to be type III membrane proteins (type III proteins are Integral 
membrane proteins that contain a N-tenminal mentbrane-anchoring domain which is not cleaved during brasynthesis 
and which functions both as a transkxatbn signal and as a membrane anchor); they also contain three additional 
transmembrane regions, at least seven consented cysteines reskiues, and are of approximately the same size (218 
to 284 residues). These proteins are collectively know as the Iransmembrane 4 super family* (TM4) because they span 
the plasma membrane four times. A schemata diagram of the domain stmcture of these proteins isshown below. +- 

»■ — • 1 1^ — I- II TMa I Extra I TM2I Cyl I TM3 I Extracellular I TM4 I 

Cytl +-H 1. ^ — C — C — H -CC C — C- ~h — -C— + ********* Cyt: cytoplasmfc doma^. TMa : 
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transmembrane anchor.TM2.to.TM4: transmembrane regions 2 to 4.'C': conserved cysteine. '** : position of the pattern. 
A conserved region that includes two cysteines and seems to be located in a short cytoplasmic loop between two 

transmembrane domains has been selected as a signature for these proteins. 

Consensus pattem:Q-x(3HLIVMF]-x(2HGSAHLIVMF](2)-G-C-x.[GA]-[STA]- x(2)-[EG]-x(2)-[CWN]-[LIVM](2) 
s [1] Levy S.. Nguyen VQ . Andria M.L. Takahashi S. J. Biol. Chem. 266:14597-1 4602(1 991 ).[ 2] Tomlinson M,G., Wil- 
liams A.R.. Wright M.D. Eur. J. Immunol. 23:1 36-40(1 993), [.3]..Barclay A.N., Birkeland M L. Brown M.H., Beyers a!d., 
Davis S. J., Somoza C, Williams A.F. The leucocyte antigen factbooks. Academic Press, London / San Diego, (1993). 
[1563] 673. Tryptophan synthase alpha chain signature 

Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion of indolegtycerol phos- 
10 phate and serine, totryptophan and glyceraldehyde 3-phosphate [1 ,2]. It has two functional domains: one for the aldol 
• cleavage of indoleglycerol phosphate to indole andglyceraldehyde 3-phosphate and the other for the synthesis of 
tryptophan fromlndole and serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains are fused together on a single multifunctional protein. A conserved region 
that contains three consented acidic residues has been selected as a signature pattern for the alpha chain. The first 
IS and the third acidic residues are believed to serve as proton donors/acceptors in the enzyme's catalytic mechanism. 
Consensus pattern: [LIVM]-E-ILIVM]-G-x(2)-[FYC]-[ST|-lDE]-[PAHLIVMYl- [AGLI]-[DEhG 

[ 1] Crawford I.R Annu. Rev. Microbfol. 43:567-600(1 989).[ 2] Hyde C.C., Miles E.W. Bic/Technology 8:27-32(1990). 
[3] Bertyn M.B., Last R.L., Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). 
[1564] 674. Tryptophan synthase beta chain pyridoxal^^hdsphate attachment site 

20 Tryptophan synthase catalyzes the last step in the bk>synthesis of tryptophan: the conversion of Indoleglycerol phos- 
phate and serine, totryptophan and glycerakJehyde 3-phosphate [1 .2]. It has two functional domains: one for the aldol 
' . cleavage of indoleglycerol phosphate to ErKtole andglyceraldehyde 3-phosphate and the other for the synthesis of 
' tryptophan fromindole and serine, lii bacteria and plants [3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains aref used together on a single multifunctional protein. The beta chain of 

^ . the enzyme requires pyridoxal<phosphate as a cofactor. The pyridoxal-phosphate group is attached to a lysine residue. 
The regk>n around this lysine residue also contains two histidine residues which are part of the pyridoxal-phosphate 
binding site. The signature pattern for the tryptophansynthase beta chain Is derived from that consen/ed regk>n. 

- - Consensus pattern: [LI VM]-x-H-x-G-[STA]-H-K-x-N [K is the pyridoxal-P attachment site] 

30 

[1] Crawford LP. Annu: Rev Microbbl. 43:567-600(1 989).[ 2] Hyde C.C.. Miles E.W. Bio/Technotogy 8:27-32(1990). 
[ 3] Berlyn M.B., Last R.L, Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). ■ 
[1565] 675. Serine proteases, trypsin family, active sites 

The catalytk: activity of the serine proteases from the trypsin family is provided by a charge relay system Involving an 

35 aspartic acid residue hydrogen-bondedto a histidine, which itself is hydrogen -bonded to a serine. The sequences in 
the vicinity of the active site serine and histidine residues are well consen/ed in this family of proteases [1]. A partial 
list of proteases known to bdlong to the trypsin family is shown bek>w. - Acrosln. - Blood coagulatk>n factors VII, IX, X, 
XI and XII, thrombin, plasminogen, and prdfein C. - Cathepsin 6. - Chymotrypsins. - Complement components C1r, 
CIS, C2, and complement factors B, D and I. Complementnactivating component of RA-reactive factor. - Cytotoxic 

40 cell proteases (granzymes A to H). - Duodenase I. - Elastases 1, 2, 3A, 3B (protease E), leukocyte (medullasin). - 
Enterokinase (EC 3.4.21.9) (enteropeptidase). - Hepatocyte growth factor activator. - Hepsin. - Glandular (tissue) ka^ 
llikreins (including EGF-bindIng protein types A. B; and C, NGF-gamma chain, gamma-renin, prostate specific antigen 
(PSA) and tonin). - Plasma kallikrein. - Mast cell proteases (MCP) 1 (chymase) to 6. - Myeloblastin (proteinase 3) 
(Wegener's autoantigen). - Plasminogen activators (urokinase-type, and tissue-type). - Trypsins I, II, III, and IV. -Tryp- 

45 tases. - Snake venom proteases such as ancrod, batroxobin, cerastobin, flavoxobin, and protein C activator - Colla- 
genase from common cattle grub and collagenolytic protease from Atlantic sand fiddler crab. - Apolipoprotein(a). - 
BkxxJ fluke cercarial protease. - Drosophila trypsin like proteases: alpha, easier, snake-locus. - Drosophila protease 
stubble (gene sb). - Major mite fecal allergen Der p III. All the above proteins belong to family SI In the classificaitton 
of peptldases[2,E1] and originate from eukaryotic species. It should be noted thatt)acterial proteases that belong to 

so family S2A are similar enough in the regions of the active site residues that they can be picked up by the same patterns. 
These proteases are listed below. - Achromobacter lytbus protease I. - Lysobacter alpha-lytb protease. - Streptogrisin 
A and B (Streptomyces proteases A and B). - Streptomyces griseus glutamyl endopeptkJase 1 1. - Streptomyces f radiae 
proteases 1 and 2. 

Consensus pattern: (LI VMHSTl-A-pTAGJ-H-C [H is the active site residue] 
SS Consensus pattern: [DNSTAGCHGSTAPIMVQH]-x(2)-G-[DE]-S-G-[GSHSAPHV]-[LIVMFYWHHLIVMFYSTANQH] 
[S is the active site residue] 

(1] Brenner S: Nature 334:528-530(1 988).l 2] Rawlings N.D., Barrett A.J, Meth. Enzymol. 244:1 9-61 (1994).[E1] 
[1566] 676. (tsp) Thrombospondtn type 1 domain 
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[15671 [1] Bork P; FEBS lett 1993;327:125-130. 

[1568] 677. Tubulin subunits alpha, beta, and gamma signature 

Tubulins [1,2], the major constituent of microtubules are dimeric proteins which consist of two closeiy related subunits 
(alpha and beta). Tubulin birids two molecules of GTP at two different sites (N and E). At the E (Exchangeable) site, 

5 GTP Is hydrolyzed during incorporation into the microtubule. Near the E site is an invariant region rich in glycines which 
is found in both chains andwhich is now [3] said to control the access of the nucleotide to Its binding site. A signature 
pattern was developed from this region. With the exception of the simple eukaryotes, most species express a variety 
of closely related alpha and beta isotypes. In most species there is a third member of the tubulin family: gamma tubulin. 
Gamma tubulin is found at microtubule organizing centers (MTCX)) such as the spindle poles or the centrosome. sug- 

10 gesting that it is involved in the minus-end nucleation of microtubule assembly [4]. 
Consensus pattern: [SAG]-G-G-T-G-[SA]-G 

[ 1] Cleveland D.W.. Sullivan K.F Annu. Rev. Biochem. 54:331 -365(1 985).[ 2] Josh! H.C., Cleveland D.W. Cell Motil. 
Cytoskeleton 1 6: 159-1 63(1 990). [ 3] Hesse J., Thierauf M., PonslingI H. J. Biol, Chem. 262:1 5472-1 5475(1 987).[ 4] 
JoshiH.C. BioEssays 15:637-643(1993). 

IS [1569] Tubutin-beta mRNA autoregulation signal 

The stability of beta-tubulin mRNAs are autoregulated by their own translation product [1]. Unpolymerized tubulin sub- 
units bind directly (or activate a factor(s) which binds co-translationally) to the nascent N-termlnus of beta-tubulin. This 
binding is transduced through the adjacent ribosomes to activatean RNAse that degrades the polysome-bound mRNA. 
The recognition element has been shown to be the first four amino acids of beta4ubulin: Met-Arg-Glu-lle. Mutations 

20 to this sequence abolish the autoregulation effect (except for the replacement of Glu by Asp); transposition of this 
sequence to an internal region of a polypeptide also suppresses the autoregulatory effect. 
Consensus pattem: <M-R-(DE]-[IL] 

[ 1] Cleveland D.W. Trends Biochem. Sci. 13:339:343(1988). 

[1570] 678. (tRNA-synt 2c) Aminoacyl-transfer RNA synthetases class-ll signatures. Aminoacyl-tRNA synthetases 

2S (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as 
the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty different types of aminoacyl- 
tRNA synthetases, one for each different amino acid. In eukaryotes there are generally twoaminoacyMRNA synthetas- 
es for each different amino acid: one cytosolic form and a mitochondrial form. While all these enzymes have a common 
functbn, they are widely diverse in terms of subunit size and of quaternary structure. The synthetases specific for 

30 alanine, asparaglne, aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine are referred 
to as class-ll synthetases [2 to 6] and probably have a common folding pattem in their catalytic domain for the binding 
of ATP and amino acid whk:h Is different to the Rossmann fold obsen^ed for the class I synthetases [7].CIass-ll tRNA 
synthetases do not share a high degree of similarity, however at least three consented regbns are present [2,5,8]. 
Signature patterns have been derived from two of these regions. 

3S Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)rF-x(3)-[DE]- 

Consensus pattem: [GSTALVFJ-{DENQHRKP}-[GSTAJ-[LIVMF]-[DE]-R-[LlVMF]-x-[LIVMSTAGJ-(LlVM 
[1571] [ 1] Schimmel P Annu. Rev. Biochem. 56:125-158(1987).[2] Delarue M., Moras D. BioEssays 15:675-687 
(1993).[3] Schimmel P Trends Bkxhem. Sci. 16:1-3(1991).[4] Nagel G.M.. Doolitlle R.F, Proc. Natl. Acad, Sci. U.S. 
A. 86:8121-8125(1991). [5] CusackS., Haertlein M.. Lebemnan R. Nucleic Acids Res. 1 9:3489-3498(1 991 ).[ 6] Cusack 

40 S. Biochimie 75: 1077-1 081 (1993).[ 7] Cusack S., Berthet-Cotominas C, Haertlein M., Nassar N., Lebennan R. Nature 
347:249-255(1 990).[ 8] Leveque R, Plateau P., Dessen P.. Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
[1572] 679. UBA-domain 

[1573] The UBA-domain (ubiquitin associated domain) is a novel sequence motif found in several proteins having 
connections to ubiquitin and the ubiquitination pathway. The structure of the UBA domain consists of a compact three 
45 helix bundle [1]. Number of members: 84 

[1574] [1] Structure of a human DNA repair protein UBA domain that interacts with HIV-1 Vpr. Dieckmann T, Withers- 
Ward ES, Jarosinski MA. Liu CF, Chen IS, Feigon J; Nat Stmct Bk>l 1998;5:1042-1047. 
[1575] 680. UBX domain 

Domain present in ubk:{uitin-regulatory proteins. Present in FAF1 and Shplp.Number of members: 19 
so [1] The UBA domain: a sequence motif present in multiple enzyme classes of the ubiqultinatkxi pathway. Hofmann K. 
Bucher P; Trends Bk)chem Sci 1 996;21 : 1 72-1 73. . 

[1576] 681 . (UCH) Ubiquitin carboxyl-terminal hydrolases family 1 cysteine active site 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitffiating enzymes) [1 ,2] are thbl proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
ss poly-ubquitin precursors as well as that of ubquinated proteins. There are two distinct families of UCH. The first class 
consist of enzymes of about 25 Kd and is currently represented by: - Mammalian isozymes LI and L3. - Yeast YUH1. 
- Drosophila Uch.One of the active site residues of class-l UCH [3] is a cysteine. A signature pattem has been derived 
from the regk>n around that residue. Consensus pattem: Q-x(3)-N-[SA]-C-G-x(3)-[LlVM](2)-H-[SA]-[LIVM]-(SAl [C is 
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the active site residue 

[ 1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089:1 27-1 39(1 991 ).[ 2] D'andrea A., Pellman D. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998). [ 3] Johnston S.C., Larsen C.N., Cook W. J., Wilkinson K.D., Hill CP. EMBO 
J. 1 6:3787-3796(1 997).[ 4] Rawllngs N.D.. Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

s [1 577] 682. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-1 ) 

Llbk)uitln carboxyl-ternninal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C4erminat glycine of ubiquitin. These enzymes are involved in the processing of 
poly-ubiqultin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1 . UBP2, UBP3. 

10 UBP4 (or DOA4/SSV7), UBP5, UBP7, UBP9, UBP10, UBP11, UBP12, UBP13, UBP14. UBPISand UBP16. - Human 
tre-2. - Human isopeptidase T. - Human isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenorhabdftis elegans 
hypothetical protein R10E11 .3. - Caenorhabditis elegans hypothetrcal protein K02C4.3.These proteins only share two 
regions of similarity. The first region containsa conserved cysteine which is probably Implicated in the catalytic mech- 

IS anism. The second region contains two conserved histidines reskJues, one of whk:h is also probably implk^ted In the 
catalytic mechanism. Signature patterns for both conserved regons have been devek>ped. 

Consensus pattem: G-[LI VMFY]-x(1 ,3)-[AGC]-(N ASM]-x-C-[FYW]-{LIVMCHNST]-ISACV]-x-[LI VMSJ-Q [C Is the puta- 
tive active site residue] 

Consensus pattern: Y-x-L-x-[SAG]-[LIVMFT]-x(2)-H-x-G-x(4,5)-G-H-Y [The two H's are putative active site resklues] 
20 [ 1] Jentsch S., Seufert W., Hauser H.-P Bkxhim. Biophys, Acta 1089: 127-1 39(1 991 ).[ 2] D'andrea A.. Pellman D. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998).[ 3] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 
[1578] 683. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-2) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubk^uitinatlng enzymes) [1 .2] are thbl proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 

25 poty-ubk^uitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class consist of largeproteins (800 to 2000 resklues) and is currently represented by: - Yeast UBP1 , UBP2, UBP3. 
UBP4 (or CX)A4/SSV7). UBP5, UBP7. UBP9, UBP10. UBP11. UBP12, UBP13. UBP14. UBPISand UBP16. - Human 
tre-2. - Human isopeptidase T. - Human isopeptkJase T-3. - Mamnnalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenorhabditis elegans 

30 hypothetk:al protein R1 0E11 .3. - Caenorhabditis elegans hypothetical protein K02C4.3.These proteins only share two 
regions of similarity. The first regkxi containsa consen/ed cysteine which is probably implk^ated in the catalytic mech- 
anism. The second region contains two consen/ed histklines reskiues, one of whbh is also probably impricated in the 
catalytic mechanism. Signature patterns for both conserved regk>ns have been developed. 

Consensus pattem: G-[LIVMFY]-x(1 ,3)-[AGC]-[NASM]-x-C-[FYWl-[LlVMC]-[NST|-[SACV]-x-[LIVMS]-Q [C Is the puta- 

3S tive active site residue] 

Consensus pattem: Y-x-L-x-[SAG]-[LI VMFr]-x(2)-H-x-G-x(4,5)-G-H-Y [The two H's are putative active site residues] 
[ 1] Jentsch S.. Seufert W., Hauser H.-P. Bkchim. Biophys. Acta 1089:127-1 39(1 991 ).[ 2] D'andrea A.. Pellman D. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998).[ 3] Rawlings N.D., Barrett A.J. Meth. Enzynrx)!. 244:461-486(1994). 
[1579] 684. UDP-glycosyltransferases signature 

40 UDP glycosyltransferases (UGT) are a superfamily of enzymes that catalyzes the additk>n of the glycosyl group from 
a UTP-sugar to a snriall hydrophobic molecule. This family currently consist of: - Mammalian UDP-glucoronosyl trans- 
ferases (UDPGT) [1.2]. A large family of membrane-bound mk:rosonnal enzymes whteh catalyze the transfer of glu- 
curonic acki to a wde variety of exogenous and endogenous lipophilk; substrates. These enzymes are of major im- 
portance in the detoxificatk>n and subsequent elimination of xerK)biotk:s such as drugs and carcinogens. - A large 

45 number of putative UDPGT from Caenorhabditis elegans. - Mammalian 2-hydroxyacylsphingoslne 1 -beta-galactosyl- 
transf erase [3] (also known as UDP-galactose-ceramkie galactosyltransf erase). This enzyme catalyzes the transfer 
of galactose to ceramkJe. a key enzymatic step in the bbsynthesis of gatactocerebroskles, whbh are abundant sphin- 
golipids of the myelin membrane of the central nervous system and peripheral nervous system. - Plants flavonol 0(3)- 
glucosyltransferase. An enzyme [4] that catalyzes the transfer of glucose from UDP-glucose to a flavanol. This reactk)n 

so is essential and one of the last steps in anthocyanin pigment bbsynthesis. - Baculoviruses ecdysteroid UDP-glucosyl- 
transferase (EC 2.4.1.-) [5] (egt). This enzyme catalyzes the transfer of glucose from UDP-glucose to ectysteroids 
whk^h are insect molting hornrK>nes. The expression of egt in the insect host interferes with the nomnal Insect develop- 
ment by blocking the nrx>lting process. - Prokaryotic zeaxanthin glucosyl transferase (gene crtX). an enzyme involved 
in carotenokJ bk>synthesis and that catalyses the glycosylation reactton whfch converts zeaxanthin to zeaxanthin-beta- 

ss diglucoskJe.-Streptomyces macrolkJe glycosyltransferases [6]. These enzymes specifically inactivates macrolide ani- 
tibk>t«s via 2'-0-glycosylation using UDP-glucose. These enzymes share a consen/ed domain of about 50 amino acid 
residues locatedin their C-terminal section and from which a pattem has been extracted todetect them. 
Consensus pattem: [FWl-x(2)-Q-x(2)-(LlVMYA]-[LIMV]-x(4.6)-[LVGAC]- [LVFYA]-[LIVMF]-[STAGCM]-(HNQ]- 



240 



EP 1 033 405 A2 



(STAGC]-G-x(2)-(STAG]-x(3HSTAGL]- [UVMFA]-x(4)-[PQRHUVMT]-x(3)-[PA]-x(3HDES]-[QEHN] 
[ 1] Dutton G.J. (In) Glucoronidation of drugs and other compounds, Dutton G. J., Ed., pp 1 -78. CRC Press, Boca Raton, 
(1980).[ 2] Burchell B.. Nebert D.W.. Nelson D,R., Bock K.W., lyanagi T, Jansen PL. Lancet D., Mulder G.J., Chow- 
dhury J R.. Siest G.. Tephly T.R., Mackenzie PI. DNA Cell Bk)l. 10:487-494(1 991 ).[ 3] Schulte S., Stoffel W. Proc. Natl. 
5 Acad. Sci. U.S.A. 90:10265-10269(1 993).[ 4] FurtekD.. Schiefelbein J.W.. Johnston R. Nelson O.E. Jr. Plant Mol. Bk)l. 
1 1 :473-481 (1 988).[ 5] aReilly D.R., Miller L.K. 

Science 245:1110-111 2(1 989). [ 6] Hernandez C, Olano C. Mendez C, Salas J.A. Gene 134:139-140(1993). 
[1 580] 685. UDP-glucose/GDP-mannose dehydrogenase family 

[1581] The UDP-glucose/GDP-mannose dehydrogenaseses are a small group of enzymes whch possesses the 
10 ability to cattyze the NAD-dependent 2-fold oxklatton of an ak:hotol to an acki without the release of an akjehyde 
intermediate [2]. Number of members: 55 

[1582] [1] Purifk:atk>n and characterization of guanosine diphospho-D-mannose dehydrogenase. A key enzyme in 
the biosynthesis of alginate by Pseudomonas aeruginosa. Roychoudhury S. May TB, Gill JF, Singh SK, FeingokJ DS, 
Chakrabarty AM; J Biol Chem 1 989;264: 9380-9385. [2] Properties and kinetk^ analysis of UDP-glucose dehydrogenase 
IS from group A streptococci. Irreversible inhibitkm by UDP-chloroacetol. Campbell RE. Saia RF. van de Rijn I. Tanner 
ME; J Bk)l Chem 1997;272:3416-3422. 
[1583] 686. Uracil-DNA glycosylase signature 

Uracil-DNA glycosylase (EC 3.2.2.-) (UNG) [1] is a DNA repair enzyme that excises uracil reskjues from DNA by 
cleaving the N-glycosylic bond. Uracil in DNA can arise as a result of misincorportation of dUMP residues by DNA 

20 polymerase or deamlnation of cytosine. The sequence of uracil-DNA glycosylase is extremely well conserved [2] in 
bacteria and eukaryotes as well as in herpes viruses. More distantly related uracil-DNA glycosylases are also found 
in poxviruses [3].ln eukaryotic cells, UNG activity is found in both the nucleus and the mitochondria. Human UNG1 
protein is transported to both the mitochondria and the nucleus [4]. The N-lerminal 77 amino acids of UNG1 seem to 
be required for mitochondrial kx:aIization [4], but the presence of a mitochondrial transitpeptide has not been directly 

2S demonstrated. As a signature for this type of enzyme, the most N-termina conserved region has been selected. This 
region contains an aspartk: ackJ residue whk:h has been proposed, based on X-ray structures [5,6] to act as a general 
base in the catalytk: mechanism. 

Consensus pattern: [KRHLIV]-[LIVC]-[LIVM]-x-G-[QI]-D-P-Y [D is the active site residue]- 

[ 1] Sancar A., Sancar G.B. Annu. Rev. Biochem. 57:29-67(1 988).[ 2] Olsen L.C., Aasland R., Wittwer C.U., Krokan 
30 H.E., Helland D.E. EMBO J. 8:3121-3125 (1989).[ 3] Upton C, Stuart D.T. McFadden G. Proc. Natl. Acad. Sci. U.S. 
A. 90:451 8-4522(1 993).[ 4] Slupphaug G., Markussen F.-H.. Olsen L.C„ Aasland R., Aarsaether N., Bakke O., Krokan 
H.E.. Helland D.E. Nuciek: Acids Res. 21 :2579-2584(1993).[ 5] Sawa R.. McAuley-Hecht K., Brown T, Pearl L Nature 
373:487-493(1 995).[ 6] Mol CD., Anrai A.S.. Slupphaug G., Kavli B., Alseth I., Krohan H E.. Tainer J.A. Cell 80:669-878 
(1995).f 7J Muller S.J., Caradonna S. Bkxhim. Biophys. Acta 1088:197-207(1991 ).[ 8] Meyer-Siegler K., Mauro D.J.. 
35 Seal G., Wurzer J.. Deriel J.K., Sirover M.A. Proc. Natl. Acad. Sci. U.S.A. 88:8460-8464(1 991 ).[ 9] Muller S.J., Cara- 
donna S. J. Biol. Chem. 268: 131 0-1 31 9(1 993). [10] Barnes D.E., LindahlT. Sedgwick B. Curr.Opin. Cell Biol. 5:424-433 
(1993). 

[1584] 687. Uncharacterized protein family UPF0001 signature 

The following uncharacterized proteins have been shown [1] to share regions ofsimilarities: 

40 

- Yeast chromosome II hypothetk:al protein YBL036c. - Caenorhabditis elegans hypothetical protein F09E5.8. - 
Bacillus subtilis hypothetk:al protein ytmE. - Escherichia coli hypothetical protein yggS and HI0090, the correspond- 
ing Haemophilus influenzae protein. - Helbobacter pybri hypothetical protein HP0395. - Mycobacterium tubercu- 
losis hypothetical protein MtCY270.20. - Synechocystis strain PCC 6803 hypothetical protein slr0556. - A Pseu- 
ds domonas aeruginosa hypothetcal protein in pifT 5'region. - A Vibrio alginolyticus hypothetical protein in pilT 5're- 
gion. These are proteins of from 25 to 30 Kd whfch contain a number of consented regions. The best consented 
regk)n whch is kx:ated in the first third of these proteins has been selected as a signature pattem. 

Consensus pattem: [FW]-H-[FMHIV]-G-x-[LIV)-Q-x-[NKR]-K-x(3)-[LIV] 
SO [1] Bairoch A.. Rudd K. E. Unpublished observatk>ns (1996). 

[1585] 688. Uncharacterized protein family UPF0003 signature 

The fol towing uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli protein 
aefA. - Escherk:hia coli hypothetical protein yggB. - Escherichia coli hypothetical protein yjeP and HI0195.1. the cor- 
responding Haemophilus influenzae protein. - Escherk:hia coli hypothetical protein ynal. - Bacillus subtilis hypothetk:al 
ss protein yhdY - Helicobacter pylori hypothetical protein HP041 5. - Synechocystis strain PCC 6803 hypothetical protein 
slr0639. - Archaeoglobus fulgkJus hypothetical protein AF1546. - Methanococcus jannaschii hypothetteal protein 
MJ01 70. - Methanococcus jannaschii hypothetk;al protein M J1 1 43.The size of these proteins range from 30 to 1 20 Kd. 
They all contain a number <rf transmembrane regtons. The best consented regton which is located in and just after the 
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last potential transmembrane region has been selected as a signature pattern,. 

Consensus pattern: G4STIF]-V-x(2HLIVM]-x(6HUVMF].x(3HDQhx(3)-[UV]- x-[LIV)-P-N-x(2HUVMFHLIVFSTA]-x 

(5)-N 

[ 1] Bairoch A. Unpublished observations (1997). 

[1586] 689. Uncharacterized protein family UPF0004 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yliQ. - Escherichia coli hypothetical protein yleA and H1001 9. the corresponding Haemophilus influenzae 
protein. - Bacillus subtilis hypothetical protein yqeV. - Helicobacter pylori hypothetical protein HP0269. - Helicobacter 
pylori hypothetical protein HP0285. - Mycoplasma iowae hypothetical protein in 16S RNA 5'region. - h4ycobacterium 
leprae hypothetical protein B2235_C2_1 95. - PseudonrK)nas aeruginosa hypothetical protein In hemL 3'region. - Syn- 
echocystis strain PCC 6803 hypothetical protein slr0082. - Synechocystis strain PCC 6803 hypothetical protein 
SI10996. - Methanococcus jannaschil hypothetical protein MJ0865. - Methanococcus jannaschii hypothetical protein 
MJ0867. - Caenorhabditis elegans hypothetical protein F25B5.5.The size of these proteins range from 47 to 61 Kd. 
They contain six consen/ed cysteines, three of which are clustered in a region that can be used as asignature pattern. 
Consensus pattern: [LIVM]-x4LIVMT>x(2)-G<:-x(3)-C-{STANHFY]-C-x-ILIVM]-x(4)-G 
[1] Bairoch A. Unpublished obsen/ations (1997). 
[1587] 690. Uncharacterized protein family UPF0005 signature 

The following proteins seems to be evolutionary related [1]: - Mammalian protein TEGT (Testis Enhanced Gene Tran- 
script). - Escherichia coli hypothetical protein yccA and HI0044, the corresponding HaenfK)philus influenzae protein. - 
A probable Pseudomonas aeruginosa ortholog of yccA. These are proteins of about 25 Kd which seem to contain 
seven transmembranedomains. A sigriature pattern that corresponds to a region that starts with the beginning of the 
third transmembrane domain and ends in the middle of the fourth one has been developed. 

Consensus pattem: G-[UVM](2)-{SAJ-x(5,8)-G-x(2)-[LIVM]-G-P-x-L-x(4)-[SAG]-x(4,6)-ILIVM](2)-x(2)-A-x(3)-T-A- 
[LIVM](2)-F 

[1] Walter L, Marynen P., Szpirer J., Levan G.. Guenther E. Genomics 28:301-304(1995). 
[1 588] 691 . Uncharacterized protein family UPF0006 signatures 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBL055c. - Escherichia coli hypothetical protein ycfH and H10454, the corresponding Haemophilus 
influenzae protein. - Escherichia coli hypothetical protein yigW. - Escherichia coli hypothetical protein yjjV and HI0081 , 
the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yabD. - Haemophilus influ- 
enzae hypothetical protein H1 1664. - Mycoplasma genitalium hypothetical protein MG009. These are proteins of from 
24 to 47 Kd which contain a number of conserved regions. They can be picked up In the database by the following 
patterns. 

Consensus pattem: [LIVMFY](2)-D-[STA]-H-x-H-[LIVMF]-[DN 
Consensus pattern: P-[LIVM]-x-[LIVM]-H-x-R-x-[TA]-x-{DE 

Consensus pattem: [LVSA]-[LIVA]-x(2)-[UVM]-[PSJ-x(3)-L4LIVM]-[LIVMS]-E-T- D-x-P 
[ 1] Bairoch A.. Rudd K.E. Unpublished obsen/ations (1995). 
[1589] 692. Uncharacterized protein family UPF0007 signature 

The following proteins seems to be evolutionary related [1]: - Escherichia coli hypothetical protein ygbP and HI0672. 
the con"esponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacM. - Mycobacterium tu- 
berculosis hypothetical protein MtCY06G1 1.29c. - Synechocystis strain PCC 6803 hypothetical protein slr0951. - A 
Rhodobacter capsulatus hypothetical protein in nif R3 5'region. Except for the Rhodobacter protein which contains a 
C-terminal extension, all these proteins have from 225 to 236 amino acids. They are hydrophilic proteins that can be 
picked up in the database by the folk>wing pattem. 
Consensus pattem: V-L-[IV]-H-D-[GA]-A-R 
[ 1] Bairoch A. Unpublished observations (1997). 
[1590] 693. Uncharacterized protein family UPF001 5 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chronrK)some II 
hypothetfcal protein YBR002c. - Yeast chromosome XIII hypothetk»l protein YMRIOIc. - Escherichia coli hypothetical 
protein yaeU and HI0920, the conesponding Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein 
HP1721. - Mycobacterium leprae hypothetrcal protein B1937_F2_65. - A Corynebacterium glutamicum hypothetical 
protein in aroF 3'region. - A Streptomyces fradlae hypothetfcal protein In transposon Tn4556. - Synechocystis strain 
PCC 6803 hypothetical protein 8110505. - Methanococcus jannaschii hypothetical protein MJ1 372.These are proteins 
of about 26 to 40 Kd whose central regkxi is well consen/ed. They can be prcked up in the database by the folbwing 
pattem. 

Consensus pattem: [DE]-[LIVMF](3)-R-T-{SG]-G-x(2)-R-x-S-x-[FY]-[LIVMl(2)-W-Q- 

[ 1] Wolfe K.H.. Lohan AJ.E. Yeast 10:S41-S46(1994). 

[1 591] 694. Uncharacterized protein family UPF001 6 signature 
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The following uncharacterized proteins have been shown [1 ] to share regions of similarities: - Yeast hypothetical protein 
YBR187W. - Fission yeast hypc^etical protein SpAC17G8.08c. - Mouse protein pFT27. - Synechocystis strain PCC 
6803 hypothetical protein sll061 5. These are hydrophobic proteins of 200 to 320 amino acids that seem to contain six 
or seven transmembrane domains. A consen/ed region which seems, in the eukaryotic proteins of this family, to directly 
5 foltow the second transmembrane domain has been selected as a signature pattern. 
Consensus pattern: E-[LIVM)-G-D-K-T-F^LIVMF](2)-A- 
[ 1] Bairoch A. Unpublished obsen^tions (1996). 
[1592] 695. Uncharacterized protein family UPF0021 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome VII 
10 hypothetical protein YGL211w. - DIctyostelium discoideum protein veg136. - Methanococcus jannaschii hypothetical 
proteins MJ1157 and MJ1478.These are proteins of from 300 to 36o residues. They can be picked up in thedatabase 
by the folbwing pattern which is located In their N-termlnalsectlon. Consensus pattem: C-K-x(2)-F*x(4)*E-x(22.23)-S- 
G-G-K-D 

[ 1] Bairoch A. Unpublished obsen^ations (1997). 

IS [1593] 696. Uncharacterized protein family UPF0023 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Mouse protein 22A3. • 
Yeast chromosome XII hypothetical protein YLR022c. - Caenorhabditis elegans hypothetical protein W06E11 .4. - Meth- 
anococcus jannaschii hypothetical protein MJ0592.These are hydrophilic proteins of about 30 Kd. They can be picked 
up in the database by the following pattern. 

20 [1594] Consensus pattem: D-x-D-E-[LIV)-L-x(4)-V-F-x(3)-S-K-G- 
[1595] [1 ] Bairoch A. Unpublished observatk)n8 (1 997). 

[1596] 697. Uncharacterized protein family UPF0024 signature. The foltowing uncharacterized proteins have been 
shown [1] to share regbns of similarities: - Escherichia coli hypothetical protein ygbO and HI0701, the corresponding 
Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein HP0926. - Yeast chromosome XV hypothet- 
2S leal protein YOR243c. - Caenorhabditis elegans hypothetical protein B0024.11 . - Methanococcus jannaschii hypothet- 
ical proteins MJ0588 and MJ1 364. These are hydrophilic proteins of from 39 to 77 Kd. They can be picked up in the 
database by the following pattem. 

[1597] Consensus pattem: G-x-K-D-{KR]-x-A-[LV]-T-x-Q-x-[LIVF]-[SGC]- 
[ 1] Bairoch A. Unpublished obsen^ations (1997). 

30 [1598] 698, Uncharacterized protein family UPF0025 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yfcE. - Bacillus subtllls hypothetk:al protein ysnB. - Mycoplasma genitalium and pneumoniae hypothet- 
k:al protein MG207. - Methanococcus jannaschii hypothetical proteins MJ0623 and MJ0936. These are hydrophilic 
proteins of about 20 Kd. They can be picked up in thedatabase by the following pattern. 

3S Consensus pattern: D-V>[UV]-x(2)-G-H-[ST]-H-x(12)-[LIVMF]-N-P-G 
[ 1] Bairoch A, Unpublished obsen/atbns (1997). 
[1599] 699. Uncharacterized protein family UPF0029 signature 

The foltowing uncharacterized proteins have been shown [1] to share regk)ns of similarities: - Yeast chronrK>some III 
hypothetical protein YCR59c. - Yeast chromosome IV hypothetical protein YDL177C. - Escherbhia coli hypothetbal 
40 protein yigZ and HI 0722, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein 
yvyE. - A Thermus aquaticus hypothetical protein in pol 5'regk>n. These proteins can be picked up in the database by 
the folbwing pattem. 

Consensus pattem: G-x(2)-[UVM](2)-x(2)4LIVM]-x(4)-[LIVM]-x(5)-[LIVM](2)-x- R4FYW](2)-G-G-x(2)-[LIVM]-G 
[ 1] KoonIn E.V, Bork P, Sander C. EMBO J. 13:493-503(1994). 

45 [1600] 700. Uncharacterized protein family UPF0030 signature 

The tol towing uncharacterized proteins have been shown [1] to be highly similar: - Yeast chromosome VI hypothetical 
protein YFL060c. - Yeast chromosome XI 1 1 hypothetical protein YMR095c. - Yeast chromosome XIV hypothetical protein 
YNL334C. - Bacillus subtilis hypothetical protein yaaE. - Haemophilus influenzae hypothetk^l protein HI1648. - Meth- 
anococcus jannaschii hypothetical protein MJ1661 .These are hydrophilic proteins of about 19 to 25 Kd. They can be 

so picked up inthe database by the following pattem. 

Consensus pattern: [GA]-L-I-[LIV)-P-G-G-E-S-T-[STA] 

[ 1] Bairoch A. Unpublished obsewattons (1997). 

[1601] 701 . Uncharacterized protein family UPF0032 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coll hypo- 
ss thetical protein yigU and H10188, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical 
protein ycbT. - Mycobacterium tubercutosis hypothetical protein MtCY49.33c and U2126A. the corresponding Myco- 
bacterium leprae protein. - Synechocystis strain PCC 6803 hypothetical protein sll0194. - Odontella sinensis and Por- 
phyra purpurea chlroplast hypothetical protein ycf43.These proteins have from 245 to 317 amino ackis and seem to 
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contain at least six or seven transmembrane regions. A conserved region located in the central section of these proteins 
has been developed as a signature pattern.. 

Consensus pattern: Y-x(2)-F-[LIVMA](2)-x-L-x(4)-G-x(2)-F4EQHLIVMF]-P- [UVM] - [ 1) Bairoch A., Rudd K.E. Unpub- 
lished observations (1 996). 

5 [1602] 702. Uncharacterized protein family UPF0034 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yhdG and HI0979, the corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical 
protein yjbN and HI0634, the corresponding Haemophilus influenzae protein. - Escherichia coli hypothetical protein 
yohl and H 10270, the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacF. - 

10 Rhodobacter capsulatus protein nif R3 and related proteins in Azospirillum brasilense and Rhizobium leguminosarum. 
- Synechocystis strain PCC 6803 hypothetical protein slr0644. - Synechocystis strain PCC 6803 hypothetical protein 
SII0926. - Caenorhabditis elegans hypothetical protein C45G9.2. - Yeast protein SMM1. - Yeast hypothetical protein 
YLR401C. - Yeast hypothetical protein YLR405w. - Yeast hypothetical protein YMLOSOw. Although it has been proposed 
[2] that Rhodobacter capsulatus nif R3 is a transcriptional regulatory protein, it Is believed that these proteins constitute 

IS a family of enzymes whose active site could include a consented cysteine which has been used as the central part of 
a signature pattern. 

Consensus pattem: [LIVMHDNGHUVM)-N-x-G-C-P-x(3)-[LIVMASQ]-x(5)-G-[SAC] 

[ 1] Bairoch A., Rudd K.E. Unpublished obsen/atbns (1995).[2] Foster-Hartnett D.. Cullen PJ., Gabbert K.K., Kranz 
R.G. Mol. Microbiol. 8:903-914(1993). 

20 [1603] 703. Uncharacterized protein family UPF0038 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yacE and HI0890, the corresponding Haemophilus Influenzae protein. - Mycobacterium tuberculosis 
hypothetical protein MtCYOI B2.23 and O410, the corresponding Mycobacterium leprae protein. - Synechocystis strain 
PCC 6803 hypothetical protein slr0553. - Other hypothetical proteins from Aeromonas hydrophlla, Bacteroides nodo- 

25 sus, Neisseria gonorrhoeae, Pseudomonas putlda, Thermus thermophilus and Xanthomonas campestris. - Human 
hypothetical protein pOV-2. - Yeast hypothetical protein YDR196C. - Caenortiabditis elegans hypothetical protein 
T05G5.5.These proteins all contain, in their N-terminal extremity, an ATP/GTP-binding motif 'A' (P-loop) (see 
<PDOC00017>). The size of these proteins range from 200 to 290 residues (with the exception of the Mycobacterial 
sequences which are are 410 residues bng). A conseved region some 50 residues away from the ATP-binding P-loop 

30 has been developed as a signature pattern. 

Consensus pattem: G-x-[LI]-x-R-x(2)-L-x(4)-F-x(B)-ILIV]-x(5)-P-x-[LIV]-[ 1] Rudd K.E.. Bairoch A. Unpublished obser- 
vatbns (1997). 

[1604] 704. Ubiquitin-conjugating enzymes active site 

Ubiqultin-conjugating enzymes (UBC or E2 enzymes) [1,2.3] catalyze the covalent attachment of ubiqultin to target 
35 proteins. An activatedubiqultin moiety is transferred from an ubiqultin-activating enzyme (E1 ) to E2which later ligates 
ubiqultin directly to substrate proteins with or without the assistance of 'N-end' recognizing proteins (E3). In most 
species there are many forms of UBC (at least 9 in yeast) which are implicated in diverse cellular f unctk^ns. A cysteine 
residue is required for ubiquitin-thlolester formation. There is a single conserved cysteine in UBC's and the region 
around that residue isconsen/ed in the sequence of known UBC isozymes. That regk>n has been used as a signature 
40 pattem. 

Consensus pattem: [FYWLSP]-H-[PC]-{NH]-[LIV]-x(3.4)-G-x-[LIV]-C-[LIV]-x- [LIVJ [C is the active site residue] 
[ 1] Jentsch S., Seufert W., Sommer T. Reins H.-A. Trends Biochem. Sci. 15:1 95-1 98(1 990).[ 2] Jentsch S., Seufert 
W., Hauser H.-P Bk)chim. Bwphys. Acta 1089: 127-1 39(1 991 ).[ 3]Hershko A. Trends Biochem. Sci. 16:265-268(1991). 
[1605] 705. Uroporphyrinogen decartx)xylase signatures 

45 U roporphyrinogen decarboxylase (URO-D), the fifth enzyme of the heme blosynthetic pathway, catalyzes the sequential 
decarfcx>xylation of the four acetyl side chains of uroporphyrinogen to yield coproporphyrinogen [1].URO-D deficiency 
is responsible for the Human genetic diseases familiatporphyria cutanea tarda (fPCT) and hepatoerythropoietk: por- 
phyria (HEP).The sequence of URO-D has been well conserved throughout evolutbn. The best conserved regton is 
located in the N-terminal section; it contains a perfectlyconserved hexapeptide. There are two arginine residues in this 

so hexapeptide which could be involved In the binding, via salt bridges, to the carboxylgroups of the propionate side chains 
of the substrate. This region has been used as a signature pattem. A second signature pattem is based on a another 
well consented regbn which is kx:ated in the central section of the protein. 
Consensus pattem: P-x-W-x-M-R-Q-A-G-R 

Consensus pattem: G-F-{STAGCVl-[STAGC]-x-P-[FYW]-T-[LV]-x(2)-Y-x(2)-[AE]-[GK] 
55 1 1] Garey J R., Labbe-Bois R.. Cholstowska A., Rytka J.. Harrison L, Kushner J., l^bbe P Eur. J. Biochem. 205: 
1011-1016(1992). 

[1 606] 706. ubiE/COQ5 methyltransf erase family signatures 

The following methyllransferases have been shown [1] to share regions of similarities: - Escherichia coli ubiE. whfch 



244 



EP 1 033 405 A2 



is involved in both ubiquinone and menaqulnone biosynthesis and which catalyzes the S-adenosylmethionine depend- 
ent methylation of 2-polyprenyl-6-methoxy-1 ,4*benzoqulnol into 2-polyprenyl-3- methyl-6-methoxy-1 .4-benzoquinol 
and of demethylmenaquinol into menaquinoL - Yeast COQ5. a ubiquinone biosynthesis methlytransf erase. - Bacillus 
subtilis spore germination protein C2 (gene: gercB or gerC2), a probable menaquinone biosynthesis methlytransferase. 

5 - Lactococcus lactis gerC2 homolog. - Caenorhabdrtis elegans hypothetical protein ZK652.9. - Leishmana donovani 
amastigote-specific protein A41 .These are hydrophilic proteins of about 30 Kd (except for ZK652.9 which is 65Kd). 
They can be picked up in the database by the following patterns. 
Consensus pattern: Y-D-x-M-N-x(2)-(LIVM]-S-x(3)-H-x(2)-W 
Consensus pattern: R-V-[LIVM]-K-[PVl-G-G-x-[LIVMF]-x(2HLIVM]-E>x-S 

10 1 1] Lee RT. Hsu A.Y. Ha N T, Clarke C.R J. Baclertol. 179:1748-1754(1997). 
[1607] 707. Uricase signature 

Uricase (urate oxkiase) [1] is the peroxisomal enzyme responsible for the degradation of urate into allantoin. Some 
species, like primates and birds, have lost the gene for uricase and are therefore unable to degradeurate. Uricase is 
a protein of 300 to 400 amino acids. A highly conserved region kx:ated in the central part of the sequence has been 
IS used as a signature pattem. 

Consensus pattem: [LV]-x-[LVHLIV]-K-[STVHST]-x-[SN]-x-F-x(2)-[FY]-x(4)- [FY]-x(2)-L-x(5)-R 
[ 1] Motojima K.. Kanaya S.. Goto S. J. Bbl. Chem. 263:16677-16681(1988). 
[1608] 708. Universal stress protein family (Usp) 

[1 609] By a wide range of stress conditions members of the Usp family are predicted to be related to the MADS-box 
^ proteins transcript tact and bind to DN A [2]. Number of members: 39 

[1] Expresskxi and role of the universal stress protein, UspA, of Escherichia coli during growth arrest. Nystrom T 
Neidhardt FC; Mol Microbiol 1994; 11:537-544. 

[2] Sequence analysis of eukaryotic devek>pmental proteins: ancient and novel domains. Mushegian AR, Koonin 
2S EV; Genetk:s 1 996; 1 44:81 7-828. 

[1610] 709. Ubiquitin domain signature and profile 

Ubquitin [1,2.3] is a protein of seventy six amino acid residues, found in all eukaryotk: cells and whose sequence Is 
extremely well conserved from protozoan to vertebrates. It plays a key role In a variety of cellular processes, such as 

30 ATP-depcndent selective degradation of cellular proteins, maintenance of chromatin structure, regulatk>n of gene ex- 
pression, stress response and ribosome biogenesis. In most species, there are many genes coding for ubiquitin. How- 
ever they can be classified Into two classes. The first class produces polyubiquitin molecules consisting of exact head 
to tall repeats of ubk^uitin. The number of repeats is variable (up to twelve in a Xenopus gene). In the majority of 
polyubiquitin precursors, there is a final amino-acid after the last repeat. The second class of genes produces precursor 

35 proteins consisting of a single copy of ubiquitin fused to a C-lerminal extension protein (CEP). There are two types of 
CEP proteins and both seem to be ribosomal proteins. Ubiquitin is a gk>bular protein, the last four C-terminal residues 
(Leu-Arg- Gly-Gly) extending from the compact structure to form a lail*. important for its functk)n. The latter is mediated 
by the ccvalent conjugatbn of ubk|uitin to target proteins, by an isopeptide linkage between the C-terminal glycine and 
the epsikm amino group of lysine reskiues in the target proteins. There are a number of proteins which are evolutionary 

40 related to ubiquitin: - Ubiqultin-IIke proteins from baculoviruses as well as in some strains of bovine viral diarrhea viruses 
(BVDV). These proteins are highly similar to their eukaryotic counterparts. - Mammalian protein GDX [4]. GDX is com- 
posed of two domains, a N-temninal ubk^uitin-like domain of 74 reskiues and a C4erminal domain of 83 residues with 
some similarity with the thyroglobulin hormonogenic site. - Mammalian protein FAU [5]. FAU is a fuskxi protein which 
consist of a N-terminal ubk)uitrn-like protein of 74 residues fused to ribosomal protein S30. - Mouse protein NEDD-8 

45 [6], a ubiquitin-like protein of 81 residues. - Human protein BAT3, a large fusion protein of 1132 residues that contains 
a N-tenminal ubiquitin-like domain. - Caenorhabditis elegans protein ubl-1 [7]. Ubl-1 is a fuston protein which consist 
of a N-terminal ubiquitin-like protein of 70 residues fused to ribosomal protein S27A. - Yeast DNA repair protein RAD23 
[8]. RAD23 contains a N-tenfninal domain that seems to be distantly, yet significantly, related to ubiquitin. - Mammalian 
RAD23-related proteins RAD23A and RAD23B. - Mammalian BCL-2 binding athanogene-1 (BAG-1 ). BAG-1 Is a protein 

so of 274 reskiues that contains a central ubk|ultln-llke domain. - Human spllceosome associated protein 114 (SAP 114 
or SF3A1 20). - Yeast protein DSK2. a protein involved in spindle pole body duplicatbn and which contains a N4erminal 
ubiquitin-like domain. - Human protein CKAP1/TFCB, Schizosaccharomyces pombe protein alpll and Caenorhabditis 
elegans hypothetk»l protein F53F4.3. These proteins contain a N4erminal ubqultin domain and a C-terminal CAP- 
Gly domain. - Schizosaccharomyces pombe hypothetical protein SpAC26A3.16. This protein contains a N-terminal 

ss ubiquitin domain. - Yeast protein SMT3. - Human ubquitln-like proteins SMT3A and SMT3B. - Human ubiquitin-like 
protein SMT3C (also known as PIC1; Ubl1 ; Sumo-1; Gmp-1 or Sentrin). This protein is involved in targeting ranGAPI 
to the nuclear pore complex protein ranBP2. - SMT3-like proteins In plants and Caenortiabditis elegans. To kJentify 
ubiquitin and related proteins, a pattem has been devek)ped based on consented posltkxris in the central section of 
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the sequence. A profile was also developed that spans the complete length of the ubiquitin domain. 
Consensus pattem: K-x(2HUVM]-x^DESAK)-x(3)-ILIVM]^PA]-x(3)-Q-x^UVM)^LIVMCHUVMFY^x-G-x(4)^DE 

[ 1] Jentsch S., Seufert W., Hauser H.-P. Blochim. Biophys. Acta 1089:127-1 39(1991 ).[ 2] Monia B.R, Ecker D J.. Croke 
ST. Bio^Technokjgy 8:209-21 5(1 990).[ 3] Finley D.. Varshavsky A. Trends Biochem. Scl. 10:343-347(1 985). [ 4] Fllippi 
5 M., Tribbli C, Toniolo D. Genomics 7:453-457(1 990).[ 5] Olvera J., Wool I.G. J. Btol. Chem. 268:17967-17974(1 993). 
[ 6] Kumar S.. Yoshida Y., Nkxla M. Bkx:hem. Biophys. Res. Gommun. 1 95:393-399(1 993).[ 7] Jones D., Candido E. 
P. J. Biol. Chem. 266:19545-19551 (1993).[8] Melnick L. Shemnan F. J. Mol. Bk>l. 233:372-388(1993). 
[1611] 710. VHS domain 

[1612] Domain present in VPS-27, Hrs and STAM. Number of members: 27 

10 [1613] 711. Vinculin family signatures 

Vinculin [1 ] Is a eukaryotic protein that seems to be involved in the attachment of the actin-based microfilaments to the 
plasnna membrane. Vmculinls kx:ated at the cytoplasmic side of focal contacts or adhesion plaques. In addition to actin, 
vinculin interacts with other structural proteins such as talin and alpha-actinrns. Vinculin is a large protein of 116 Kd 
(about a 1000 residues). Structurally the protein consists of an acidic N-terminal domain of about 90 Kd separated 

IS from a basic C-terminal domain of about 25 Kd by a proline-rbh region of about 50 residues. The central part of the 
N-terminal domain consists of avariable number (3 In vertebrates. 2 in Caenorhabditis elegans) of repeats of a 110 
amino acids domain. Catenins [2] are proteins that associate with the cytoplasmic domain of avariety of cadherins. 
The association of catenins to cadherins produces a complex which is linked to the actin filament network, and which 
seems to be of primary importance for cadherins cell-adhesion properties. Three different types of catenins seem to 

^ exist: alpha, beta, and gamma. Alpha-eaten ins are proteins of about 100 Kd which are evolutbnary related to vinculin. 
Interm of their structure the most significant differences are the absence, inalpha-catenin, of the repeated domain and 
of the proline-rbh segment. Two signature patterns for this family of proteins have been devolped. The first pattem is 
kx^ated in the N-terminal sectkjn of both vinculin and alpha-catenins and is part, in vinculin, of a domain that seems to 
be involved with the interactkxi with talin. The second pattem is based on a conserved regbnin the N-temninal part of 

2S the repeated domain of vinculin. 

Consensus pattern: [KR]-x-[UVMF]-x(3)-[LIVMAl-x(2)-[LIVM]-x(6)-R-Q-Q-E-L Consensus pattem: [LIVM]-x-[QA]-A-x 
(2)-W-[IL]-x-[DN]-P 

[ 1] Otto J.J. Cell Motil. Cytoskeleton 16:1 -6(1 990).[ 2] Hen^enknecht K., Ozawa M., Eckerskom C, Lottspefch F, Lenter 
M., KemlerR. Proc. Natl. Acad. Sci. U.S.A. 88:9156-9160(1991). 
30 [1614] 712. (Viteltogenin N) Lipoprotein amino terminal region 

[1 61 5] Tills family contains regions from: Vitelbgenin, Microsomal triglyceride transfer protein and apolipoprotein B- 
100. These proteins are all involved in lipkJ transport [1]. This family contains the LVIn chain from lipovitellin, that 
contains two structural domains. Number of members: 33 

[1616] [1] The structural basis of lipid Interactions in lipovitellin, a soluble lipoprotein. Anderson TA. Levitt DG, Ba- 
3S naszak LJ Structure 1998;6:895-909. 

[1 61 7] 713. (VMSA) Major surface antigen from hepadnavirus 

[1 61 q 714. ssDNA binding protein (Viral DNA bp) 

This protein is found in herpesviruses and Is needed for replication. 

[1 61 9] 71 5. (Votage CLC) Voltage gated chlorkle channels 
40 [1620] This family of ton channels contains 10 or 12 transmembrane helices. Each protein forms a single pore. It 

has been shown that some members of this family form honrxxiimers. These proteins contain two CBS domains. 

[1] Schmidt-Rose T Jentsch TJ; J Biol Chem 1997;272:20515-20521. 

[2] Zhang J, George AL Jr. Griggs RC, Fouad GT, Roberts J, Kwiecinski H, Connolly AM, Ptacek LJ; Neurotogy 
45 1996;47:993-998. 

[1621] 716. von Willebrand factor type A domain (vwa) 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospondin-related anonymous 
protein, dihydropyridine-sensitive cafeium channel and inter-atpha-trypsin inhibitor. 
so Bork P, Rohde K; 

BkxJhem J 1 991 ;279:908-91 1 . 

1. RUGGERI, Z.M. and WARE. J. 
von Willebrand factor. 

ss FASEB J. 7 308-316 (1993). 

2. COLOMBATTI. A., BONALDO. P and DOLIANA. R. 

Type A modules: interacting domains found in several non-fibrillar collagens and in other extracellular matrix pro- 
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teins. 

MATRIX 1 3 297-306 (1 993). 

a PERKINS, SJ.. SMITH, K.F.. WILLIAMS, S.C.. HARIS, Rl„ CHAPMAN. D. and SIM. RB. 
5 The secondary structure of the von Willebrand factor type A domain in factor B of human complement by Fourier 

transform infrared spectroscopy. 

Its occurrence in collagen types VI, VI t, XII and XIV, the Integrins arKi other proteins by averaged structure pre- 
dictions. 

J.MOLBIOL 238 104-119 (1994), 

10 

4. BORK. P and ROHDE. K. 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospondin-related anony- 
mous protein, dihydropyridine-sensitive calcium channel and inter-alpha-trypsin inhibitor. 
BIOCHEM.J. 279 908-910 (1991). 

15 

5. EDWARDS, Y.J.K. and PERKINS. S.J. 

The protein fold of the von Willebrand factor type A domain is predicted to be similar to the open twisted beta- 
sheet flanked by alpha-helices found in human ras-p21 . 
FEBS LETT. 358 283-286 (1995). 

20 

6. LEE, J.O., RIEU, P, ARNAOUT M.A. and LIDDINGTON, R. 

Crystal structure of the A domain from the alpha subunit of Integrin CR3 (CD11b/CD18). 
CELL 80 631-638 (1995). 

2S 7. QU, A. and LEAHY. D J. 

Crystal structure of the l-domain from the CD11a/CD18 (LFA-1, alpha L beta 2) integrin. 
PROC.NATLACAD.SCI.USA92 10277-10281 (1995). 

[1622] The von Willebrand factor is a large multimeric glycoprotein found in blood plasma. Mutant forms are involved 
30 in the aetblogy of bleeding disorders [1 ]. In von Willebrand factor, the type A domain (vWF) is the prototype for a protein 
superfamily. The vWF domain is found in various plasma proteins: complement factors B, C2, CR3 and CR4; the 
integrins (l-domains); collagen types VI, VII, XII and XI v; and other extracellular proteins [2-4]. Proteins that incorporate 
vWF domains participate in numerous bbloglcal events (e.g., cell adhesion, migration, homing, pattem formation, and 
signal transduction), involving interaction with a large array of ligands [2]. Secondary structure prediction from 75 
35 aligned vWF sequences has revealed a largely alternating sequence of alpha-helices and beta-strands [3]. Fold rec- 
ognition algorithms were used to score sequence compatibility with a library of known structures: the vWF domain fold 
was predated to be a doubly-wound, open, twisted k>eta-sheet flanked by alpha-helk^es [5]. 3D structures have been 
determined for the l-domains of integrins CDIIb (with bound magnesium) [6] and CDIIa (with bound manganese) [7]. 
The domain adopts a classic alpha/beta F%3ssmann fold and contains an unusual metal ion coordinatkyi site at its 
40 surface. It has been suggested that this site represents a general metal bn-dependent adhesion site (MIDAS) for 
binding protein ligands [6]. The reskiues constituting the MIDAS nrK>tif in the CDIIb and CDIIa l-domains are com- 
pletely consented, but the manner in whk:h the metal ion is coordinated differs slightly [7]. 

[1623] VWFADOMAIN is a 3-element fingerprint that provides a signature for the vWF domain superfamily. The 
fingerprint was derived from an initial alignment of 14 sequences. Motif 1 includes the first beta-strand and 3 consented 

45 residues involved in metal ion coordinatkjn in l-domains (Asp and 2 serines in positions 8, 10 and 12, respectively); 
motif 2 spans strands beta-2 and beta-Z; and motif 3 encodes beta-strand 3 and a consented Asp (in positbn 7). which 
coordinates the metal ion [6.7J. Three iteratk)ns on OWL27.0 were required to reach convergence, at whch point a 
true set comprising 56 sequences was identified. Numerous partial matches were also found. 
[1 624] 71 7. (WD40) WD domain. G-beta repeat 

50 The ancient regulatory-protein family of WD-repeat proteins. 
Neer EJ. Schmidt CJ. Nambudripad R, Smith TF; 
Nature 1994;371:297-300. 

Beta-transducin (G-beta) is one of the three subunits (alpha, beta, and gamma) of the guanine nucleotide-binding 
proteins (G proteins) which act as Intermediaries in the transduction of signals generated by transmembrane receptors 
ss [1]. The alpha subunit binds to and hydrolyzes GTP; the functbns of the beta and gamma subunits are less clear but 
they seem to be required for the replacement of GDP by GTP as well as for membrane anchoring and receptor rec- 
ognitk)n. 

[1625] In higher eukaryotes G-beta exists as a small multlgene family of highly consented proteins of about 340 



247 



EP 1 033 405 A2 



amino acid residues. Structurally G-beta consists of eight tandem repeats of about 40 residues, each containing a 
central Trp-Asp motif (this type of repeat Is sometimes called a WD^ repeat). Such a repetitive segment has been 
shown [El ,2,3,4,5] to exist in a number of other proteins listed below: 

s - Yeast STE4, a component of the pheromone response pathway. STE4 Is a G-beta like protein that associates with 
GPA1 (G-a!pha) and STE18 (G-gamma). 

- Yeast MSI1 . a negative regulator of nAS-mediated cAMP synthesis. MSI1 is most probably also a G-beta protein. 
Human and chicken protein 1 2.3. The function of this protein is not known, but on the basis of its similarity to G- 
beta proteins, it may also function in signal transduction. 

10 - Chlamydonrx)nas reinhardtii gblp. This protein is most probably the homolog of vertebrate protein 12.3. 

- Human LIS1 , a neuronal protein involved in type-1 lissencephaiy [E2]. 

- Mammalian coatomer beta' subunit (beta'-COP), a component of a cytosdic protein complex that reversibly as- 
sociates with Gotgi membranes to form vescles that mediate bkisynthetk: protein transport. 

IS - Yeast CDC4, essential for initiation of DNA replication and separation of the spindle pole bodies to form the poles 
of the mitotic spindle. 

Yeast CDC20, a protein required for two mk:rotubule-dependent processes: nuclear movements prk>r to anaphase 
and chromosome separaton. 

- Yeast MAK11, essential for cell growth and for the replicatbn of Ml double-stranded RNA. 

20 . Yeast PRP4, a component of the U4/U6 small nuclear ribonucleoprotein with a probable role in mRNA splicing. 
Yeast PWP1 , a protein of unknown functkxi. 

- Yeast SKI6. a protein essential for controlling the propagation of double-stranded RNA. 

Yeast SOF1, a protein required for ribosomal RNA processing which associates with U3 small nucleolar RNA. 

- Yeast TUP1 (also known as AER2 or SFL2 or CYC9), a protein which has been implicated in dTMP uptake, cat- 
25 atx>lite repression, mating sterility, and many other phenotypes. 

- Yeast YGR57C, an ORF of unknown function from chromosome III. 
Yeast YCR72c. an ORF of unknown functk>n from chromosome 111. 

Slime mold coronin, an actin-binding protein. 
30 - Slime mold AAC3. a devetopmentally regulated protein of unknown function. 

- Drosophila protein Groucho (formerly known as E(spl): 'enhancer of spilt'), a protein involved In neurogenesis and 
that seems to interact with the Notch and Delta proteins. 

- Drosophila TAF-ll-80, a protein that is tightly associated with TFII D. 

35 

[1626] The number of repeats in the above proteins varies between 5 (PRP4, TUP1 , and Groucho) and 8 (G-beta, 
STE4. MSI1. AAC3. CDG4, PWP1, etc.). In G-beta and G-beta like proteins, the repeats span the entire length of the 
sequence, while in other proteins, they make up the N-terminal, the central or the C-temilnal section. 
[1627] A signature pattem can be developed from the central core of the domain (positions 9 to 23). 

40 

- Ckxisensus pattem: [LIVMSTAC]-[LIVMFYWSTAGG]-[LlMSTAG]-[LIVMSTAGC]-x(2)-[DN]- 
x(2)-[LIVMWSTAG]-x-{LIVMFSTAG}-W-[DENHLIVMFSTAGGN] 

45 [1]GiImanA.G. 

Annu. Rev. Bkxhem. 56:615-649(1987). 

[ 2] Duronk> R. J., Gordon J.I., Boguski M.S. 

Proteins 13:41-56(1992). 

[ 3] van der Vbom L. Pk)egh H.L 
so FEBS Lett. 307: 1 31 1 34(1 992). 

[ 4] Neer E.J., Schmidt C.J., Nambudripad R., Smith TF. 

Nature 371:297-300(1994). 

[ 5] Smith TF.. Gaiatzes C.G.. Saxena K., Neer E.J. 

Biochemistry In Press(1 998). 

55 

[1 628] 71 8. WHEP-TRS domain containing proteins 

A consented domain <^ 46 amino ackis has been shown [1] to exist in a number of higher eukaryote aminoacyl4ransfer 
RNA synthetases. This domain is present one to six times in the following enzymes: 
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Mammalian multifunctional aminoacyl-tRNA synthetase. The domain is present three times In a region that sepa- 
rates the N-lermlnal glutamyl-tRNA synthetase domain from the C-terminat prolyl-tRNA synthetase domain. 
Drosophila multifunctional aminoacyl-tRNA synthetase. The domain is present six times in the intercatalytic region. 
Mammalian tryptophanyMRNA synthetase. The domain is found at the N-terminal extremity. 
5 . Mammalian, Insect, nematode and plant glycyMRNA synthetase. The domain is found at the N^ermlnal extremity 
[2]. 

Mammalian histkJyMRNA synthetase. The domain is found at the N-terminat extremity. 

[1629] This domain, which is called WHEP-TRS, could contain a central alpha-helical region and may play a role in 
10 the association of tRNA-synthetases into multienzyme complexes. 

[1630] A signature pattern t>ased on the first 29 positions of the WHEP-Domain has been developed. 

. Consensus pattem: [QY]-G^DNEA]-x4LI\n4KR]-x(2)-K-x(2)-[KRNG]^AS]-x(4)4LIVh[DENK}.x(2)4l^^^ 
(3)-K 

15 

[ 1] Cerini C, Kerian P.. Astier M., Gratecos D.. Mirande M., Semeriva M. EMBO J. 10:4267-4277(1991). 
[ 2] Nada S., Chang RK.. DIgnam J.D. 
J. Biol. Chem. 266:7660-7667(1993). 

20 [1 631] 71 9. (Worm family 8) Putative membrane protein 

Analysis of protein domain families In Caenorftabditis elegans. 
Sonnhammer EL, Durbin R; 
Genomics 1997;46:200-216. 

This family called family 8 in [1]. may be a transmembrane protein 
2S The specific function of this protein Is unknown. 
[1632] 720. Xylose isomerase 

Xylose isomerase (EC 5.3.1.5) [1] is an enzyme found in microorganisms which catalyzes the interconversion of D- 
xylose to D-xy lulose. It can also isomerize D-ribose to D-ribulose and D-glucose to D-f ructose. Xylose isomerase seems 
to require magnesium for its activity, while cobalt is necessary to stabilize the tetrameric structure of the enzyme. A 

30 number of residues are conserved in all known xybse isomerases. 

[1633] Xyk>se isomerase also exists in plants [2] where it is honrxxiimeric and is manganese-dependent 
[1634] Two signatures patterns for xyk>se isomerase have been developed. The first one is derived from a stretch 
of five conserved amino acids that includes a glutamic acid residue known to be one of the four residues involved in 
the binding of the magnesium Ion [3]; this pattem also includes a lysine residue which is involved in the catalytk: activity 

3S The second pattem is derived from a consented region in the N-temninal section of the enzyme that include an histidine 
residue which has been shown [4] to be Involved in the catalytic mechanism of the enzyme. 
Consensus pattem: [LI]-E-P-K-P-x(2)-P 
[E Is a magnesium ligand] 
[K Is an active site residue] 

40 

- Consensus pattem: [FL]-H-D-x-D-[LIVl-x-[PD]-x-[GDE] 

[H is an active site residue] 

4S [ 1] Dauter Z., Dauter M.. Hemker J., Wrtzel H., Wilson K.S. 

FEBSLett. 247:1-8(1989). 

[ 2] Kristo RA., Saarelainen R., Fagerstrom R.. Aho S., Korhola M. 
Eur. J. Bkxhem. 237:240-246(1996). 
[ 3] Henrick K., Collyer C.A., Blow D M. 
so J. Mol. Bk>\. 208:129-157(1989). 

[ 4] Vangrysperre W., Ampe C, Kersters-HikJerson H., Tempst P. 
Biochem. J. 263:195-199(1989). 

[1635] 721. XPG protein signatures. Xeroderma pigmentosum (XP) [1] is a human autosomal recessive disease, 
SS characterized by a high incidence of sunlight-induced skin cancer. People's skin cells with this condition are hypersen- 
sitive to ultraviolet light, due to defects in the inciskjn step of DNA excision repair. There are a minimum of seven 
genetic complementation groups ^voh^ed in this pathway: XP-A to XP-G. The defect In XP-G can be corrected by a 
133 Kd nuclear protein called XPG (or XPGC) [2].XPG belongs to a family of proteins [2,3,4,5,6] that are composed 
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of twomain subsets: - Subset 1 , to which belongs XPG, RA02 from budding yeast and radl3 from fission yeast. RAD2 
and XPG are srngle-stranded DNA endonucleases [7.6]. XPG makes the 3'incision In human DNA nucleotide excision 
repair [9]. - Subset 2, to which belongs mouse and human FEN-1, rad2 from fission yeast, and RAD27 from budding 
yeast. FEN-1 is a structure-specific endonuclease. In addition to the proteins listed in the above groups, this family 

5 also includes: - Fission yeast exol, a 5'->3' double-stranded DNA exonuclease that could act in a pathway that corrects 
mismatched base pairs. - Yeast EX01 (DHS1), a protein with probably the same function as exol. - Yeast DlN7.Se- 
quence alignment of this family of proteins reveals that similarities are largely confined to two regions. The first Is 
located at the N^erminal extremity (N-region) and corresponds to the first 95 to 105 amino acids. The second region 
Is internal (l-region) and found towards the C-terminus; it spans about 140 residues and contains a highly conserved 

10 core of 27 amino acids that includes a consented pentapeptide (E-A-[DE]-A-[QS]). It is possible that the consented 
acidic residues are Involved in the catalytic mechanism of DNA excision repair in XPG. The amino acids linking the N- 
and l-regions are not consen/ed; Indeed, they are largely absent from proteins bekxiging to the second subset. Two 
signature patterns have been devetoped for these proteins. The first corresponds to the central part of the N-regk)n, 
the second to part of the l-region and includes the putative catalytic core pentapeptide 

IS [1636] Consensus pattern: [VI]-{KRE]-P-x-[FYIL]-V-F-D-G-x(2)-[PIL]-x-[LVC]-K- 

Consensus pattern: IGSHLIVM]-{PERHFYS]-[LIVM]-x-A-P-x-E-A4DE]-[PAS]- [QSHCLM]- 

[16371 [ 1] Tanaka K., Wbod R.D. Trends Bk)chem. Scl. 1 9:83-86(1 994).[ 2] Scherty D., Nouspikel T, Corlet J.. UcIa 
C, Bairoch A., Clarkson S.G. Nature 363: 182-1 85(1 993).[ 3] Carr A.M.. SheWrick K.S., Murray J.M.. Al-Harithy R., 
V\fatts RZ.. Lehmann A.R. Nucleic Acids Res. 21:1 345-1 349(1 993).[ 4] Murray J.M., Tavassoli M., Al-Harithy R.. Shel- 

20 drick K.S., Lehmann A.R.. Carr A.M., Watts F2. Mol. Cell. Biol. 14:4878-4888(1 994). [ 5] Harrington J. J., Lieber M.R. 
Genes Dev. 8: 1344-1 355(1 994). [ 6] Szankasi P, Smith G.R. Science 267:11 66-1 169(1 995). [ 7] Habraken Y, Sung R, 
Prakash L., Prakash S. Nature 366:365-368(1 993). [ 8] aDonovan A., Scherly D., Clarkson S.G., Wood R.D. J. Bbl. 
Chem. 269: 15965-1 5968(1 994).[ 9] aDonovan A., Davles A.A.. Moggs J.G., West S.C., Wood R.D. Nature 371: 
432-435(1994). 

2S [1638] 722. Xanthine/uracil permeases family 

The following transport proteins whk;h are involved in the uptake of xanthine or uracil are evolutbnary related [1]: 

Uric uric acid-xanthine permease (gene uapA) from Aspergillus nidulans. 
Purine permease (gene uapC) from Aspergillus nidulans. 
30 - Xanthine permease from Bacillus subtilis (gene pbuX). 

Uracil permease from Escherichia coli (gene uraA) [2] and Bacillus (gene pyrP). 

- Hypothetbal protein ycdG from Escherichia coll. 
Hypothetnai protein ygfO from Escherichia coli. 
Hypothetkal protein ygfU from Escherichia coli. 

35 - Hypothetical protein yicE from Escherichia coli. 

Hypothetical protein yunJ from Bacillus subtilis. 
Hypothetical protein yunK from Bacillus subtilis. 

[1 639] They are proteins of from 430 to 395 residues that seem to contain 1 2 transmembrane domains. 
40 The best consen/ed regkxi whk:h corresponds with what seems to be the tenth transmembrane domain has been 
selected as a signature pattern. 

- Consensus pattern: [UVM]-P-x-{PASIF]-V-[LIVM]-G-G-x(4)-[LlVM]4FY]-[GSA]-x-[LIVM]-x(3)-G 

45 [ 1] Diallinas G„ Gorfinkiel L., Arst G., Cecchetto G., Scazzocchb C. J. Bbl. Chem. 270:8610-8622(1995). 

[ 2] Andersen RS., Frees D., Fast R., Mygind B, J. Bacterid. 177:2008-2013(1995). 

[1640] 723. Hypothetical yabO/yceC/sfhB family 

The folbwing proteins. whk;h seems to bek)ng to a family of pseudouridine synthases (EC 4.2.1.70) [1] have been 
so shown to share regbns of similarities: 

- Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudourkJine synthase A (gene riuA). It Is 
responsible for synthesis of pseudourdine from uracil-746 IN 23S rRNA. 

- Escherichia coll and Haemophilus influenzae ribosomal large subunit pseudourkiine synthase C (gene rluC). It Is 
ss responsible for synthesis of pseudouridine from uracil at positions 955, 2504 and 2580 in 238 ri=lNA. 

- Escherichia coli protein and homobgs in other bacteria large subunit pseudouridine synthase D (gene riuD). 

- Yeast DFIAP deaminase (gene RIB2). 

- Escherichia coll hypothetk:al protein yqcB and H1 1435, the corresponding Haeniophllus influenzae protein. 
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Haemophilus influenzae hypothetical protein HI0042. 

- Aqurfex aeolicus hypothetical protein AQ_1758. 
Bacillus subtilis hypothetical protein yhcT 
Bacillus subtilis hypothetical protein yjbO. 
Bacillus subtilis hypothetical protein ylyB. 

- Helicobacter pylori hypothetical protein HP0347. 
Helicobacter pylori hypothetical protein HP0745. 
Helicobacter pylori hypothetical protein HP0956. 
h4ycoplasma genitalium hypothetical protein MG209. 
Mycoplasma genitalium hypothetical protein MG370. 

- Synechocystis stiain FCC 6B03 hypothetical protein sir 1592. 

- Synechocystis strain PCC 6603 hypothetical protein sir 1629. 

- Yeast hypothetical protein YDL036c. 

- Yeast hypothetical protein YGR169c. 

Fission yeast hypothetical protein SpACI 8B1 1 .02c. 

- Caenorhabdrtis elegans hypothetical protein K07E8.7. 

[1641] These are proteins of from 21 to 50 Kd which contain a number of consen/ed regions in their central section. 
They can be picked up in the database by the following highly conserved pattern. 

- Consensus paHem: [UVCAHNHYTl.R4LI]-D-x(2)-T-[STAl-G-[LIVAGC].[LIVMF](2)4LIVMFGCHSGTACV] 

[1642] [ 1] Conrad J., Sun D., Englund N.. Ofengand J. J. Biol. Chem. 273:18562-18566(1998). 

[1643] In addition, the following bacterial proteins, which seems to belong to a family of pseudouridine synthases 

(EC 4.2.1.70) [1] also have been shown to share regions of similarities: 

- Escherichia coli and Haemophilus influenzae 16S pseudouridylate 51 6 synthase (EC 4.2.1 .70) (gene: rsuA). This 
enzyme is responsible for the formation of pseudouridine from uracil-516 in 16S ribosonnal RNA. 
Escherichia coli hypothetical protein yciL and H1 11 99, the corresponding Haemophilus influenzae protein. 
Escherichia coli hypothetical protein yjbC. 

Escherichia coli hypothetical protein ymfC and HI0694, the corresponding Haemophilus influenzae protein. 
Aquifex aeolicus hypothetical protein AQ_554. 

- Aquifex aeolicus hypothetical protein AQ_1 464. 
Bacillus subtilis hypothetical protein ypuL. 
Bacillus subtilis hypothetical protein ytzF. 
Borrelia burgdorferi hypothetical protein BB0129. 
Helicobacter pylori hypothetical protein HP1459. 

- Synechocystis strain PCC 6603 hypothetical protein slr0361 . 
Synechocystis strain PCC 6603 hypothetical protein slr061 2. 

[1 644] These are proteins of from 25 to 40 Kd which contain a number of consen/ed regions in their central section. 
They can be picked up in the database by the following highly conserved pattem. 

- Consensus pattem: G-R-L-D-x(2)-[STA]-x-G-[LIVFA]-ILl VMF](3)-[ST]-[DNST] 

[1645] [ 1] Wrzesinski J., Bakin A., Nurse K., Lane B.G., Ofengand J. Biochemistry 34:8904r8913(1995). 
[1646] 724. Zinc finger present in dystrophin, CBP/|p300 
ZZ in dystrophin binds calmodulin 
Putative zinc finger, binding not yet shown. 
[1647] 725. Zinc cart»xypeptkjase 

There are a number of different types of zinc-dependent carboxypeptidases (EC 3.4.17.-) [1,2]. All these enzymes 
seem to be structurally and functionally related. The enzymes that bebng to this family are listed below. 

Carboxypeptidase A1 (EC 3.4. 1 7. 1 ), a pancreatk: digestive enzyme that can removes all C-terminal amino ackJs 
with the exception of Arg, Lys and Pro. 

Carboxypeptidase A2 (EC 3.4.17.15), a pancreatk: digestive enzyme with a specificity simitar to that of cartx)x- 
ypeptkiase A1 . but with a preference for bulkier C-terminal reskJues. 

Cartx>xypepttdase B (EC 3.4. 1 7.2), also a pancreatic digestive enzyme, but that preferentially removes C-tenninal 
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Arg and Lys. 

- Carboxypeptidase N (EC 3.4.17.3) (also known as arginine cartx>xypeptidase), a plasnna enzyme which protects 
the body from potent vasoactive and inflammatory peptides containing C4erminal Arg or Lys (such as kinins or 
anaphylatoxins) which are released into the circulation. 

s - Carboxypeptidase H (EC 3.4.17.10) (also known as enkephalin convertase or carboxypeptkiase E). an enzyme 
located in secretory granules of pancreatb islets, adrenal gland, pituitary and brain. This enzyme removes residual 
C-terminal Arg or Lys remaining after initial endoprotease cleavage during prohormone processing. 

- Carboxypeptidase M (EC 3.4.17.12), a membrane bound Arg and Lys specific enzyme. 

10 It is ideally situated to act on peptide homrxxies at local tissue sites where it could control their activity before or after 
interactk>n with specific plasma membrane receptors. 

Mast cell carboxypeptidase (EC 3.4.17.1). an enzyme with a specificity to carboxypeptkiase A. but found in the 
secretory granules of mast cells. 
IS - Streptomyces griseus carboxypeptidase (Cpase SG) (EC 3.4.17.-) [3], which combines the specificities of mam- 
malian carboxypeptidases A and B. 

- Themnoactinonnyces vulgaris carboxypeptkiase T (EC 3.4. 1 7. 1 8) (CPT) [4], which also combines the speciffcities 
of carboxypeptidases A and B. 

AEBP1 [5], a transcriptional repressor active in preadipocytes. AEBP1 seems to regulate transcription by cleavage 
20 of other transcriptkxial proteins. 

- Yeast hypothetk»I protein YHR1 32c. 

[1 648] All of these enzymes bind an atom of zinc. Three conserved residues are implicated in the binding of the zinc 
atom: two histidines and a glutamic acki Two signature patterns which contain these three zinc-iigands have been 
2S derived. 

- Consensus pattern: IPK]-x-[LiyMFY]-x-[LIVMFYhx(4)-H-[STAG]-x-E-x.[LI\^h[STAG]-x(6)-[LIVM [H and E 
are zinc ligands] 

- Consensus pattern: H-[STAGl-x(3)-[LI VME]-x(2)-[UVMFYW]-P-[FYW] [H is a zinc ligand] 

30 

1 1] Tan F., Chan S.J., Steiner D.R. Schilling J.W.. Skidgel R.A. 
J. Bk>l. Chem. 264:13165-13170(1989). 

[ 2] Reynokis D.S., Stevens R.L. Gurley D.S., Lane W.S.. Austen K.F.. 
Serafin W.E. 

3S J. Bk)l. Chem. 264:20094-20099(1 989). 

[ 3] Narahashi Y. 
J. Bkx:hem. 107:879^86(1990). 

[ 4] Teplyakov A., Polyakov K.. Obmok>va G., Strokopytov B., Kuranova I., 
Osterman A.L., Grishin N.V., Smulevitch S.V., Zagnitko O.P, 
^ Galperina O.V., MaU M.V., Stepanov VM. 

Eur. J. Bkxjhem. 208:281-288(1992). 
[ 5] He G.-R, Muise A., U A.W.. Ro H.-S. 
Nature 378:92-96(1995). 

[ 6] Hourdou M.-L. Gulnand M., Vacheron M.J., Mk:hel G.. Denoroy L. 
45 Duez CM.. Englebert S., Joris B.. Weber G., Ghuysen J.-M. 

Biochem. J. 292:563-570(1993). 
[ 7] Rawlings N.D„ Barrett A. J. 
Meth. Enzynrx)!. 248:183-228(1995). 

so [1649] 726. Zinc finger, C2H2 type 

The C2H2 zinc finger is the classical zinc finger domain. 

The two conserved cysteines and histidines co-ordinate a zinc ton. The following pattern describes the zinc finger. 
#-X-C-X(1 -5)-C-X3-#-X5-#-X2-H-X(36)-[H/C] 

Where X can be any amino ackl» and numbers in brackets Indicate the number of reskiues. The positions marked # 
ss are those that are important for the stable fold of the zinc finger The final position can be either his or cys. 

The C2H2 zinc finger is composed of two short beta strands foltowed by an alpha helix. The amino terminal part of the 
helix binds the major groove in DNA binding zinc fingers. 

[1 650] 'Zinc finger* domains [1 -5] are nuciek: acid-binding protein structures first kientified In the Xenopus transcrip- 
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tion factor TFIIIA. These domains have since been found in numerous nucleic acid-binding proteins. A zinc finger 
domain Is composed of 25 to 30 amino-acid residues. There are two cysteine or histidine residues at both extremities 
of the domain, which are invoh/ed in the tetrahedral coordination of a zinc atom. It has been proposed that such a 
domain interacts with about five nucleotides. A schematic representation of a zinc finger domain is shown below: 
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[1651] Many classes of zinc fingers are characterized according to the number and positbns of the histidine and 
cysteine residues involved in the zinc atom coordination. In the first class to be characterized, called C2H2, the first 
35 pair of zinc coordinating residues are cysteines, while the second pair are histidines. A number of experimental reports 
have demonstrated the zinc-dependent DNA or RNA binding property of some members of this class. 
[1652] Some of the proteins known to include C2H2-type zinc fingers are listed bebw. The number of zinc finger 
regions found in each of these proteins are indicated between brackets; a 'V symbol indicates that only partial sequence 
data is available and that additk)nal finger domains may be present. 

40 

- Saccharomyces cerevisiae: ACE2 (3). ADR1 (2), AZF1 (4), FZF1 (5). MIG1 (2), MSN2 (2). MSN4 (2), RGM1 (2). 
RIM1 (3). RME1 (3). SFP1 (2), SSL1(1), STP1 (3), SWI5 (3), VAC1 (l)andZMSI (2). 

Emericella nkJulans: brlA (2), creA (2). 

- Drosophila: AEF-1 (4), Cf2 (7), ci-D (5), Disconnected (2), Escargot (5), Glass (5), Hunchback (6), Kmppel (5), 
4S Kruppel-H (4+). Odd-skipped (4), Odd-paired (4), Pep (3), Snail (5), Spalt-major (7^ Serependity locus beta (6), 

delta (7), h-1 (8), Suppressor of hairy wing su(Hw) (12), Suppressor of variegatkjn suvar(3)7 (5), Teashirt (3) and 
Tramtrack (2). 

- Xenopus: transcription factor TFIIIA (9). p43 from RNP particle (9). Xfin (37 II). Xsna (5), gastrula Xk:GF5.1 to 
XICGF71.1 (from 4+ to 11+). Oocyte XteOF2 to XlcOF22 (from 7 to 12). 

SO - Mammalian: basonuclin (6). BCL-6/LAZ-3 (6), erythroid krueppel-like transcription factor (3), transcription factors 
Spl (3). Sp2 (3). Sp3 (3) and Sp(4) 3. transcriptional repressor YY1 (4), Wilms' tumor protein (4), EGR1/Krox24 

(3) , EGR2/Krox20 (3). EGR3^iIot (3). EGR4/AT133(4), Evi-1 (10), GUI (5), GLI2 (4+). GLI3(3+). HIV-EP1/2NF40 

(4) , HIV-EP2 (2), KR1 (9+). KR2 (9), KR3 (15+), KR4 (14+). KR5 (11+). HF.12 (6+), REX-1 (4), ZfX (13), ZfY (13). 
Zfp-35 (18). ZNF7 (15). ZNF8 (7). ZNF35 (10), ZNF42/MZF-1 (13). ZNF43 (22). ZNF46/Kup (2). ZNF76 (7), ZNF91 

ss (36),ZNF133(3). 

[1653] In additkxi to the conserved zinc ligand residues it has been shown [6] that a number of other positbns are 
also important for the structural integrity of the C2H2 zinc fingers. The best consented position is found four reskiues 
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after the second cysteine; it is generally an aromatic or aliphatic residue. 

- Consensus pattern: C-x(2.4)-C-x(3HLIVMFYWC]-x(8)-H-x(3.5)-H [The two C's and two H*s are zinc iigands] 

5 [ 1] Klug A., Rhodes D. 

Trends Biochem. Sci. 12:464-469(1987). 

[ 2] Evans R.M., Hollenberg S.M. 

Cell 52:1-3(1988). 

[ 3] Payre R, Vincent A. 
10 FEBS Lett. 234:245-250(1 988). 

[ 4] Miller J.. McLachlan A.D.. Klug A. 

EMBO J. 4:1609-1614(1985). 

[5] Berg J.M. 

Proc. Natl. Acad. Sci. U.S.A. 85:99-102(1988). 
IS 1 6] Rosenfeld R.. Margalit H. 

J. Biomol. Struct. Dyn. 11:557-570(1993). 

[1 654] 727. Zinc finger. C3HC4 type (RING finger) 

A number of eukaryottc and viral proteins contain a consen/ed cysteine-rich domain of 40 to 60 residues (called C3HC4 
20 zinc-finger or *RINGfinger) [1] that binds two atoms of zinc, and ts probably involved in mediating protein-protein in- 
teractions. The 3D structure of the zinc ligation system Is unique to the RING domain and is refered to as the 'cross- 
brace' motif. The spacing of the cysteines In such a don^in is C-x(2)-C-x(9 to 39)-C-x(1 to 3)-H-x(2 to 3)-C-x(2)-C-x 
(4 to 48)-C-x(2)-C. 

[1655] Proteins currently known to include the C3HC4 domain are listed below (references are only provided for 
25 recently determined sequences). 

Mammalian V(D)J recombination activating protein (gene RAG1). RAG1 activates the rearrangement of immu- 
noglobulin and T-cell receptor genes. 

Mouse rpt-1 . Rpt-1 is a trans-acting factor that regulates gene expression directed by the promoter region of the 

30 interleukin-2 receptor alpha chain or the LTR promoter region of HI V-1 . 

Human rfp. Rfp is a developmentalty regulated protein that may function in male germ cell development. Recom- 
bination of the N^erminal section of rfp with a protein tyrosine kinase produces the ret transforming protein. 
Human 52 Kd Ro/SS-A protein. A protein of unknown function from the Ro/SS-A ribonucleoprotein complex. Sera 
from patients with systemic lupus erythematosus or primary Sjogren's syndrome often contain antibodies that react 

3S with the Ro proteins. 

Human histocompatibility kxus protein RING1 . 

Human PML. a probable transcrlptkxi factor. Chromosomal translocatkxi of PML with retinoic receptor alpha cre- 
ates a f uskxi protein which is the cause of acute promyebcytk; leukemia (APL). 
Mammalian breast cancer type 1 susceptibility protein (BRCA1 ) [El ]. 
40 - Mammalian cbl proto-oncogene. 

Mammalian bmi-1 proto-oncogene. 

- Vertebrate CDK-activating kinase (CAK) assembly factor MAT1 , a protein that stabilizes the complex between the 
CDK7 kinase and cyclin H (MAT1 stands for 'Menage A Trois.'). 

Mammalian mel-l 8 protein. Mel-1 8 whnh is expressed in a variety of tunnor cells is a transcriptional repressor that 
^ recognizes and bind a specific DNA sequence. 

- Mammalian peroxisome assembly factor-1 (PAF-1) (PMP35), which is somewhat involved in the biogenesis of 
peroxisomes. In humans, defects in PAF-1 are responsible for a form of Zellweger syndrome, an autosomal re- 
cessive disorder associated with peroxisomal defk:iencies. 

Human MAT1 protein, whbh interacts with the CDK7-cyclin H complex. 
so . Human RING1 protein. 

Xenopus XNF7 protein, a probable transcription factor. 

- Trypanosoma protein ESAG-8 (T-LR), which may be involved in the postranscriptional regulatkxi of genes in VSG 
expresskxi sites or may interact with adenylate cyclase to regulate its activity. 

- Drosophila proteins Posterior Sex Combs (Psc) and Suppressor two of zeste (Su(z)2). The two proteins belong 
ss to the Polycomb group of genes needed to maintain the segment-specific repression of homeotic selector genes. 

- Drosophila protein male-specifk: msl-2, a DNA-bindIng protein which is involved in X chromosome dosage com- 
pensatbn (the elevatkxi of transcriptk)n of the male single X chromosome). 

- Arabidopsis thaliana protein COP1 which is involved in the regulatkxi of pholomorphogenesis. 
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- Fungal DNA repair proteins RAD5. RAD16. RAD18 and radS. 

- Herpesviruses trans-acting transcriptionat protein ICPQ/IE11 0. This protein which has t>een characterized In nnany 
different herpesviruses is a trans-activator and/or -repressor of the expression of rnany viral and cellular promoters. 
Baculoviruses protein CG30. 

5 - Baculoviruses major immediate early protein (PE-38). 

- Baculoviruses immediate-early regulatory protein IE-N/IE-2. 

- Caenorhabditis elegans hypothetical proteins F54GB.4, R05D3.4 and T02C1 .1 . 

- Yeast hypothetical proteins YER116C and YKR017C. 

10 [1656] The central region of the domain was selected as a signature pattern for the C3HC4 finger 

- Consensus pattern: C-x-H-x-[UVMFY]-C-x(2)-C-ILI VMYA] 

[1657] [ 1] Borden K.LB., FreenrKMit RS. 
IS Curr Opin. Struct. Biol. 6:395-401 (1 996). 

[1 658] 728. Zinc finger C-x8-C-x5-C-x3-H type (and similar). 
[1659] 729. Zinc finger. CCHC class 

A family of CCHC zinc fingers, mostly from retroviral gag proteins (nucleocapsid). Prototype structure Is from HIV 
Also contains nr^mbers involved in eukaryotic gene regulation, such as C. elegans GLH-1. 
^ Structure is an 18-resldue zinc finger; no examples of indels In the alignment. 
[1660] 730. Zn-finger in Ran binding protein and others. 
[1661] 731. ANI-IIke Zinc finger 

[1662] Zirtc finger at the C-terminus of Ani Swiss:Q91889 . a ubiquitin-like protein in Xenopus laevis. The following 
pattern describes the zinc finger. C-X2-C-X(9-12)-C-X(1-2)-C-X4-C-X2-H-X5-H-X-C Where X can be any amino acid. 
2S and numbers in brackets indk:ate the number of residues. 

[1663] [1] Linnen JM. Bailey CR Weeks DL; Gene 1993;128:181-188. 
[1664] 732. 14-3-3 proteins 

Structure of a 14-3-3 protein and implications for coordinatk>n of multiple signalling pathways. 
Xiao B, Smerdon SJ, Jones DH, Dodson GG, Soneji Y, Aitken A, Gamblin SJ; Nature 1995:376:188-191. 
30 Crystal structure of the zeta isotorm of the 14-3-3 protein, 

Liu D, Bienkowska J, Petosa C, Collier RJ. Fu H, LkJdington R; 
Nature 1995;376:191-194. 

[1665] Interactkxi of 14-3-3 with signaling proteins is mediated by the recognition of phosphoserine. 
Muslin AJ, Tanner JW, Allen PhA, Shaw AS; 
3S Cell 1996;84:889-897. 

[1666] The 14-3-3 protein binds its target proteins with a common site kx^ted towards the C-terminus. 
Ichimura T. Ito M, ItagakI C, Takahashi M, Horigome X Omata S, Ohno S, Isobe T 
FEBS Lett 1997;413:273-276. 

[1667] Molecular evolutkm of the 14-3-3 protein family 
40 w^ng W, Shakes DC 

J Mol Evoi 1996;43:384-398. 
Function of 1 4-3-3 proteir)s. 
Jin DY. Lyu MS. Kozak CA. Jeang KT 
Nature 1996;382:308-308. 

4S [1 660] The 1 4-3-3 proteins [1 ,2,3] are a family of ctosely related acidic honrxxJimeric proteins of about 30 Kd which 
were first identified as being very abundant in mammalian brain tissues and kxated preferentially in neurons. The 
14-3-3 proteins seem to have multiple biok>gk:al activrties and play a key role in signal transduction pathways and the 
cell cycle. They interacts with kinases such as PKC or Raf-1 ; they seem to also f unctkxi as proteln-kinase dependent 
activators of tyrosine and tryptophan hydroxylases arxJ in plants they are associated with a complex that binds to the 

so G-box promoter elements. 

[1669] Ttie 14-3-3 family of proteins are ubiquitously found in all eukaryotic species studied and have been se- 
quenced in fungi (yeast BMH1 and BMH2. fission yeast rad24 and rad25), plants, Drosophila, and vertebrates. The 
sequences of the 14-3-3 proteins are extremely well conserved. Two highly conserved regkxis have been selected as 
signature patterns: the first Is a peptkle of 11 residues located in the N-termlnal section; the second, a 20 amino acid 

ss region kx:ated in the C-terminal sectkxi. 

- Consensus pattern: R-N-L-[LI V]-S-[VG]-[GA]-Y-[KN)-N-[IVA] 

- Consensus pattern: Y-K-(DE]-S-T-L-I-IIM1-Q-L-[LF]-(RHC]-D-N-[LF]-T-[LS]-W-[TAN]-[SAD] 
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[ 1] Aitken A. 

Trends Biochem. Sci. 20:95-97(1995). 

( 2] Morrison D. 

Science 266:56-57(1994). 

[ 3] Xiao B., Smerdon S.J., Jones D.H.. Dodson G.G., Sonejl Y. Aitken A., Gamblln S.J. 
Nature 376:188-191(1995). 

[1670] 733. D-isomer specific 2-hydroxyacKl dehydrogenases (2 Hacid DH) 

This Pfam covers the Formate dehydrogenase, D-glycerate dehydrogenase and D-lactate dehydrogenase families in 
SCOP. A number of NAD-dependent 2-hydroxyacid dehydrogenases which seem to be specific for the D-isomer of 
their substrate have been shown [1 .2.3,4] to be f unctbnalty and structurally related. These enzymes are listed bebw 

D-lactate dehydrogenase (EC 1 . 1 . 1 . 28) , a bacterial enzyme wh ich catalyzes the reduction of D-lactate to pyruvate. 

- D-glycerate dehydrogenase (EC 1.1.1.29) (NADH-dependent hydroxypymvate reductase), a plant leaf peroxiso- 
mal enzyme that catalyzes the reduction of hydroxypynjvate to glycerate. This reaction is part of the glycolate 
pathway of photoresplratkxi. 

D-glycerate dehydrogenase from the bacteria Hyphomicroblum methyk>vorum and Methylobacterlum extorquens. 
3-phosphogtycerate dehydrogenase (EC 1 .1.1 .95), a bacterial enzyme that catalyzes the oxidation of D-3-phos- 
phogtycerate to 3-phosphohydroxypyruvate. This reaction is the first committed step in the 'phosphorylated' path- 
way of serine biosynthesis. 

- Erythronate-4-phosphate dehydrogenase (EC 1.1.1.-) (gene pdxB), a bacterial enzyme involved In the bk>synthesis 
of pyrldoxine (vitamin B6). 

- D-2-hydroxyisocaproate dehydrogenase (EC 1.1.1.-) (D-hicDH), a bacterial enzyme that catalyzes the reversible 
and stereospecific Interconversion between 2-ketocarboxylic acids and D-2-hydroxy-cartx)xylic acids. 
Formate dehydrogenase (EC 1 .2.1 .2) (FDH) from the bacteria Pseudomonas sp. 101 and various fungi [5]. 

- Vancomycin resistance protein vanH from Enterococcus faecium; this protein is a D^pecific alpha-keto acid de- 
hydrogenase involved in the fomnatkxri of a peptidogtycan which does not terminate by D-alanine thus preventing 
vancomycin binding. 

Escherichia coli hypothetical protein ycdW 
Escherichia coll hypothetk:al protein yiaE: 
Haemophilus influenzae hypothetical protein H11556. 

- Yeast hypothetk^al protein YEROSIw. 

- Yeast hypothetical protein YIL074w. 

[1 671] All these enzymes have similar enzymatic activities and are structurally related. Three of the most conserved 
regions of these proteins have been selected to develop patterns. The first pattern is based on a glycine-rich region 
kx^ated in the central section of these enzymes; this region probably corresponds to the NAD-binding domain. The two 
other patterns contain a number of conserved charged reskJues. some of which may play a role in the catalytk: mech- 
anism. 

- Consensus pattern: [LI VMA]-{AG]-[IVTl-[LIVMFY]-[AG]-x-G-[NHKRQGSAC]-{LIV)-G-x(1 3.14)-[LIVf MT|-x(2)-[FY- 
wCTH]-[DNSTK] 

- Consensus pattem: [LIVMFYWA]-[LIVFYWCpx(2)-(SAC]-[DNQHR]-[IVFA]-[LIVF]-x-[LIVn-[HNI]-x-^^^ 
x(2)-[LIVMFl-x-[GSDN] 

- Consensus pattern: [LMFATC]-[KPQ]-x-[GSTDN]-x-{LIVMFYWR]-[LIVMFYW](2)-N-x-[STAGC]-R-[GP]-x-^^^ 
[LIVMC]-[DNV] 

[1] Grant G.A. Bk«hem. Biophys. Res. Commun. 165:1371-1374(1989). 

[2] Kochhar S., Hunziker P., Leong-Morgenthaler P.M., Hettinger H. Bkx:hem. Biophys. Res. Commun. 184:60-66 
(1992). 

[3] Ohta T, Taguchi H. J. Bk)l. Chem. 266:12588-12594(1991). 

[4] GoWberg J.D., Yoshkia T, Brick P J. Mol. Biol. 236:1123-1140(1994). 

[5] Popov V.O. Lamzin V.S. Bk)chem. J. 301:625-643(1994). 

[1 672] 734. 2-oxo acid dehydrogenases acyltransf erase (catalytic domain) 

Refined crystal stmcture of the catalytic domain of dihysrollpoyi transacetylase (E2P) from azotobacter vineelandii at 
2.6 angstroms resolution. 

Mattevi A. Obmok>va G. Kalk KH. Westphal AH. De Kok A. Hot WG; 
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J Mol Biol 1993;230:1183-1199. 

These proteins contain one to three copies of a lipoyi binding domain followed by the catalytic domain. 
[1 673] 735. 3-beta hydroxysteriod dehydrogenase/isomerase family 
Structure and tissue-specific expression of 3 
5 beta-hydroxysteroid dehydrogenase/5-ene-4'ene isomerase genes in human and rat classical and peripheral ster- 
oidogenic tissues. 

Labrle R Simard J, Luu-The V. Pelletier G, Belanger A, 

Lachance Y, Zhao HF, Labrie C, Breton N, de Launoit Y. et al 

J Steroid Biochem Mol Biol 1992;41:421-435. 
10 The enzyme 3 beta-hydroxysteroid dehydrogenase/5-ene-4-ene isomerase (3 beta-HSD) catalyzes the oxidation and 

isomerization of 5-ene-3 beta-hydroxypregnene and S-ene-hydroxyandrostene steroid precursors into the correspond- 
ing 4-ene-ketosterolds necessary for the formation of all classes of steroid homrK>nes. 

[1 674] 736. 3-hydroxyacy l-CoA dehydrogenase 

This family also includes lambda crystallin. 
IS structure of L-3-hydroxyacyl-coenzyme A dehydrogenase: 

preliminary chain tracing at 2.8-A resolution. 

Birktoft JJ, HoMen HM. Hamlin a Xuong NH, Banaszak U; 

Proc Natl Acad Sci U S A 1987;84:8262-8266. 

[1 675] 3-hydroxyacyl-CoA dehydrogenase (EC 1 . 1 . 1 .35) (HCDH) [1 ] is an enzyme involved in fatty acid metabolism, 
20 it catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA. Most eukaryotic cells have 2 fatty-acid beta-oxidatton 
systems, one kx»ted in mitochondria and the other in peroxisomes. In peroxisomes 3-hydroxyacyK)oA dehydrogenase 
fonns, with enoyl-CoA hydratase (ECH) and 3.2-trans-enoyl-CoA isomerase (ECl) a multifunctbnal enzyme where the 
N-terminal domain bears the hydratase/isomerase activities and the C-terminal domain the dehydrogenase activity. 
There are two mitochondrial enzymes: one which is monofunctional and the other which is, like its peroxisomal coun- 
25 terpart, multifunctbnal. 

[1 676] In Escherbhia coli (gene fadB) and Pseudomonas tragi (gene faoA) HCDH is part of a multifunctional enzyme 
which also contains an ECH/ECI domain as well as a 3-hydroxybutyryl-CoA epimerase domain [2]. 
[1677] The other proteins structurally related to HCDH are: 

30 - Bacterial 3-hydroxybutyryl-CoA dehydrogenase (EC 1.1.1.157) whteh reduces 3-hydroxybutanoyl-CoA to ace- 
toacetyl-CoA [3]. 

- Eye lens protein lambda-crystallin [4], which is specific to lagomorphes (such as rabbit). 

There are two major region of similarities in the sequences of proteins of the HCDH family, the first one kxated in the 
3S N-terminal. corresponds to the NAD-binding site, the second one is kx:ated in the center of the sequence. A signature 
pattern has been derived from this central region. 

- Consensus pattern: [DNEl-x(2)-IGA]-F-[LlVMFYhx-[NTl-R-x(3)-[PAHUVf^FY](2)-x(5)-[UVMFYCT|4LI\^ 
(2)-[GV] 

40 

[ 1] Birktoff J.J.. Holden H.M.. Hamlin R., Xuong N.-H., Banaszak L.J. Proc. Natl. Acad. Sci. U.S.A. 84:8262-8266 
(1987). 

[ 2] Nakahigashi K., Inokuchi H. Nuciek: Acids Res. 18:4937-4937(1990). 

[ 3] Mullany P, Clayton C.L, Pallen M.J., Stone R., Al-Saleh A., Tabaqchali S. FEMS M»robioL Lett. 124:61-67 
45 (1994). 

[ 4] Mukiers J.W.M.. Hendriks W.. Blankesteijn W.M.. Btoemendal H.. de Jong W.W. J. Biol. Chem. 263: 
15462-15466(1988). 

[1678] 737. 60s Actik: ribosomal protein 
so Proteins PI , P2, and PC, components of the eukaryotic 

ribosome stalk. New structural and functbnal aspects. 

Remacha M, Jimenez-Diaz A, Santos C, Briones E, Zambrano R, 

Rodriguez Gabriel MA. Guarinos E, Ballesta JP; 

Btochem Cell Btol 1995:73:959-968. 
ss This family includes archaebacterial LI 2, eukaryotic PO, PI and P2. 

[1679] 738. 6-phosphogluconate dehydrogenases 

6-phosphogluconate dehydrogenase (EC 1.1.1.44) (6PGD) catalyzes the third step in the hexose monophosphate 
shunt, the decarboxylating reductkxi of 6-phosphogluconate in to ributose 5-phosphate. 
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[1 680] Prokaryotic and eukaryotic 6PGD are proteins of about 470 amino acids whose sequence are highly conserved 
[1]. A region which has been shown [2], from studies of the sheep 6PGD tertiary structure, to be involved In the binding 
of 6-phosphogluconate has been selected as a signature pattern. 

5 - Consensus pattern: [UVM]-x-D-x(2HGA]-[NQS]-K-G-T-G-x-W 

{ 1] Reizer A.. Deutscher J., Saier M.H. Jr., Retzer J. 
Mol. Microbiol. 5:1081-1089(1991). 

[ 2] Adams M.J., Archibald I.G., Bugg C.E., Came A, Gover S., 
10 Helliwell J R., Pickersgill R.W.. White S.W. 

EMBO J, 2:1009-1014(1983). 

[1681] 739. (7tm 1)G-protein coupled receptors [1 to4,E1,E2](alsocalledR7G) are an extensive group of hormones, 
neurotransmitters, odorants and light receptors which transduce extracellular signals by interaction with guanine nu- 
is cleotide-binding (G) proteins. The receptors that are currently known to bebng to this family are listed bek>w. 

- 5-hydroxytryptamine (serotonin) 1 A to 1 F, 2A to 2C, 4, 5A, 58, 6 and 7 [5]. 
Acetyteholine, muscarinic-type. Ml to M5. 

- Adenosine A1 , A2A, A2B and A3 [6]. 

20 - Adrenergic alpha-1 A to -1 C; alpha-2A to -2D; beta-1 to -3 [7]. 

Angiotensin II types I and II. 

Bombesin subtypes 3 and 4. 

Bradykrnin B1 and B2. 

c3a and C5a anaphylatoxin. 
25 - Cannabinoid CB1 and CB2. 

- Chemokines C-C CC-CKR-1 to CC-CKR-8. 

- Chemokines C-X-C CXC-CKR-1 to CXC-CKR-4. 

- Cholecystokinin-A and cholecystokinln-B/gastrin. 
Dopamine D1 to D5 [8]. 

30 - Endothelln ET-a and ET-b [9]. 

• fMet-Leu-Phe (fMLP) (N-formyl peptide). 

- Follicle stimulating hormone (FSH-R) [10]. 

- Galanin. 

- Gastrin-releasing peptide (GRP-R). 

3S - Gonadotropin-releasing hormone (GNRH-R). 
Histamine HI and H2 (gastric receptor I). 
Lutropin-chork>gonadotropic honmone (LSH-R) [10]. 

- Melanocortin MC1 R to MC5R. 
Melatonin. 

40 - Neurcxnedin B (NMB-R). 

- Neuromedin K (NK-3R). 
Neuropeptide Y types 1 to 6. 
Neurotensin (NT-R). 
Octopamine (tyramine). from insects. 

45 - Odorants [11]. 

Opbids delta-, kappa- and mu-types [12]. 

- Oxytocin (OT-R). 

Platelet activating factor (PAF-R). 
Prostacyclin. 
so - Prostaglandin D2. 

Prostaglandin E2, EP1 to EP4 subtypes. 
Prostaglandin F2. 
Purinoreceptors (ATP) [ 1 3]. 

- Somatostatin types 1 to 5. 
ss - Substance-K (NK-2R). 

- Substance-P(NK-IR). 
Thrombin. 

- Thromboxane A2. 
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- Thyrotropin (TSH-R) [10]. 
Thyrotropin releasing factor (TRH-R). 
Vasopressin Via, VI b and V2. 

Visual pigments (opsins arKi rtiodopsin) [14]. 
5 - Proto-oncogene mas. 

A number of orphan receptors (whose ligand is not known) from mammals and birds. 

- Caenorhabditis elegans putative receptors C06G4.5, C38C10.1 . C43C3.2, T2701 .3 and ZC84.4. 
TTiree putative receptors encoded in the genome of cytomegalovirus: US27, US28, and UL33. 
ECRF3, a putative receptor encoded in the genome of herpesvirus saimiri. 

10 

[1682] The structure of all these receptors is thought to be identical. They have seven hydrophobic regions, each of 
which rrrosX probably spans the membrane. The N-terminus is located on the extracellular side of the membrane and 
is often glycosylated, while the C-terminus is cytoplasmic and generally phosphorylated. Three extracellular loops 
alternate with three intracellular loops to link the seven transmembrane regions. Most, but not all of these receptors, 
IS lack a signal peptide. The most conserved parts of these proteins are the transmembrane regions and the first two 
cytoplasmic loops. A consented acidb-Arg-aromatic triplet is present in the N4erminal extremity of the second cyto- 
plasmic loop [1 5] and couM be implicated in the interaction with G proteins. 

[1 683] To detect this widespread family of proteins, a pattem that contains the conserved triplet and that also spans 
the major part of the third transmembrane helix has been devetoped. 

20 

- Consensus pattem: [GSTALlVMFYWC]-[GSTANCPDE]-{EDPKRH}-x(2)-[Ll VMNQG A]-x(2)-[Ll VMFT]-[GSTANC]- 
[LIVMFYWSTACHDENH]-R4FYWCSH]-x(2HLlVM] 

[ 1] Strosberg A.D. 
2S Eur J. Biochem. 196:1-10(1991). 

( 2] Kerlavage A.R. 

Curr. Opin. Struct. Bk>l. 1:394-401(1991). 

[ 3] Probst W.C., Snyder L.A., Schuster D.L, Brosius J., Sealfon S.C. 

DNA Cell Bbl. 11:1-20(1992). 
30 [ 4] Savarese TM., Fraser CM. 

Biochem. J. 283:1-9(1992). 

[ 5] Branchek T. 

Curr. Biol. 3:315-317(1993). 

[ 6] Stiles G.L. 
3S J. Biol. Chem. 267:6451-6454(1992). 

[ 7] Friell T. Kobilka B.K.. LefkowiU R.J., Caron M.G. 

Trends Neuroscl. 11:321-324(1988). 

[8] Stevens C.R 

Curr. Biol. 1:20-22(1991). 
40 [ 9] Sakurai T, Yanagisawa M., Masaki T, 

Trends Phamfiacol. Sci. 13:103-107(1992). 

[10] Salesse R., Remy J.J., Levin J.M., Jallal B.. Gamier J. 

Biochimie 73:109-120(1991). 

[11] Lancet D., Ben-Arie N. 
4S Curr. Biol. 3:668-674(1993). 

[12] Uhl G.R. Chikiers S., Pasternak G. 

Trends Neurosci. 17:89-93(1994). 

[13] Barnard E.A., Bumstock G., Webb TE. 

Trends Pharmacol. Sci, 15:67-70(1994). 
so [14] Applebury M.L, Hargrave PA. 

Vision Res. 26:1881-1895(1986). 

[15] Attwood TK., Eliopoutos E.E., FIndlay J.B.C. 

Gene 98:153-159(1991). 

ss [1684] (7tm 1 ) Visual pigments (opsins) retinal binding site 

Visual pigments [1 .2] are the light-absorbing molecules that mediate vision. They consist of an apoprotein, opsin, 
covalently linked to the chromophore cis-retinal. Viskxi is effected through the absorption of a photon by cis-retinal 
wh»h is isomerized to trans-retinal. This isomerizatbn leads to a change of conformation of the protein. Opsins are 
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integral membrane proteins with seven transmembrane regions that belong to family 1 of G-protein coupled receptors. 
[1685] In vertebrates four different pigments are generally found. Rod cells, which mediate vision in dim light, contain 
the pigment rhodopsin. Cone cells, which function in bright light, are responsible for color vision and contain three or 
rnore color pigments (for example, in mammals: red, blue and green). 
5 [1686] In Drosophila, the eye is composed of 600 facets or ommatidia. Each ommatidium contains eight photore- 
ceptor cells (R1-R8): the R1 to R6 cells are outer cells, R7 and R8 inner cells. Each of the three types of cells (R1 -R6, 
R7 and R8) expresses a specific opsin. 

[1687] Proteins evolutionary related to opsins include squid rettnochrome, also known as retinal photoisomerase, 
which converts varbus isomers of retinal into 11-cis retinal and mammalian retinal pigment epithelium (RPE) RGR [3], 
10 a protein that may also act in retinal isomerization. 

[1 688] The attachment site for retinal in the above proteins is a conserved lysine residue in the middle of the seventh 
transmembrane helix. The pattem that had been developed includes this residue. 

- Consensus pattem: [UVMWACHPGCJ-x(3HSACl-K-[STALIMR]-(GSACPNV]-lSTACPJ-x(2)-IDENFHAP]-x{2)- 
is [lY] 

[K is the retinal binding site] 

[ 1] Applebury M.L.. Hargrave PA. 
Vision Res. 26:1881-1895(1986). 
20 1 2] Fryxell K.J.. MeyerowiU E.M. 

J. Mol. Evol. 33:367-378(1991). 

[ 3] Shen D.. Jiang M., Hao W., Tao L. Salazar M.. Fong H.K.W. 
Biochemistry 33:13117-13125(1994). 

2S [1 689] The following descriptions of protein family functions are not provided by the Ram or Prosite databases. 
[1690] 740. BAH 

BAH domain. Number of members: 65 

[1] Medline: 97074677. Molecular cloning of polybromo, a nuclear protein containing multiple domains including 
30 five bromodomains. a truncated HMG-box, and two repeats of a novel domain. Nicolas RH, Goodwin GH; Gene 
1996;175:233-240. 

[2] Medline: 99198739. The BAH (bromo^djacent homology) domain: a link between DNA methylation, replk^atbn 
andtranscriptk>nal regulation. Callebaut I, Coun/alin J-C, Momon JP; FEBS letts 1999;446:189-193. 

3S [1691] 741. ELM2. 

ELM2 domain. The ELM2 (Egl-27 and MTA1 horrK)k)gy 2) domain is a small domain of unknown function. Number of 
members: 10 

[1692] 742. Euk proin. EUKARYOTIC_PORIN The major protein of the outer mitochondrial membrane of eukaryotes 
is a porin that forros a voltage-dependent anbn-selectlve channel (VDAC) that behaves as a general diffusion pore 
40 for small hydrophilic molecules [1 to 4]. The channel adopts an open confonmation at k>w or zero membrane potential 
and a ck)sed conformatk)n at potentials above 30-40 mV. i 

This protein contains about 280 amino ackte and its sequence is composed of between 1 2 to 1 6 beta-strands that span 
the mitochondrial outer membrane. Yeast contains two members of this family (genes POR1 and POR2); vertebrates 
have at least three mennbers (genes VDAC1 , VDAC2 and VDAC3) [5]. 
4S A conserved region kx:ated at the C-terminal part of these proteins was selected as a signature pattern. 

Consensus pattem[YH]-x(2)-D-[SPCAD]-x-[STA]-x(3)-[TAG]-[KR]-[LlVMF]-[DNSTA]-[DNS]-x(4)-[GSTAN]-[UVMA]-x- 
ILIVMY] 

[ 1] Benz R. Bkx:him. Bk>phys. Acta 1197:167-196(1994), 
50 [ 2] Manella C.A. Trends Bkx:hem. Sci. 17:315-320(1992). 

[ 3] Dihanich M. Experientia 46:146-153(1990). 

[ 4] Forte M., Guy H.R.. Mannella C.A. J. Boenerg. Bkxnembr. 19:341-350(1987). 

[ 5] Sampson M.J., Lovell R.S., Davison D.B.. Craigen W.J. Genomk:s 36:192-196(1996). 

ss [1 893] 743. Glyco hydor 1 9 
Chitinases family 19 signatures 

cross-ref erence(s) CHITINASE^I 9_1 , CHITIN ASE_1 9_2 

Chitinases (EC 3.2.1 .14) [1] are enzymes that catalyze the hydrolysis of the beta-1 ,4-N-acetyl-D-glucosamlne linkages 
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in chitin polymers. From the view point of sequence similarity chitinases belong to either family 18 or 19 in the classh 
ftcatlon of glycosyl hydrolases [2,E1]. Chitinases of family 19 (also known as classes lA or I and IB or 11) are enzymes 
from plants that function. In the defense against fungal and insect pathogens by destroying their chitin-containing cell 
wall. Class lA/l and 10/11 enzymes differ in the presence (lA/l) or absence (IB/II) of a N-terminal chitin-binding domain 
5 (see the relevant entry <PDOC00025>). The catalytic domain of these enzymes consist of about 220 to 230 amino 
acid residues. 

Two highly consented regions were selected as signature patterns, the first one is located in the N4erminal section 
and contains one of the six cysteines which are conserved in most, if not all, of these chitinases and which is probably 
involved in a disulfide bond. 
10 Consensus pattemC-x(4.5)-F-Y-IST]-x(3)-[FYHLIVMF>x-A-x(3)-[YF]-x(2)-F-[GSA] 
[1694] Consensus pattem[UVMHGSA]-F-x-[STAG](2)-{LIVMFY]-W-[FY]-W-{LIVM] 

[ 1]Flach J., Pilet P-E., Jolles P Experientia 48:701-716(1992). 
[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

IS 

[1695] 744. MBD 
Metf)yl-CpG binding domain 

The Methyl-CpG binding domain (MBD) binds to DNA that contains one or more symmetrically methylated CpGs [1]. 
DNA methyiation in animals is associated with alterations in chromatin structure and silencing of gene expression. 
20 MBD has negligible non-specific affinity for DNA. In vitro foot-printing with MeCP2 showed the MBD can protect a 1 2 
nucleotide regbn surrounding a methyl CpG pair [1]. MBDs are found in several Methyl-CpG binding proteins and also 
DNA demethylase [2]. Number of members: 11 

(1]MedIlne: 94232813. Dissection of the methyl-CpG binding domain from the chromosomal protein MeCP2. Nan 
2S X, Meehan RR. Bird A; Nucleic Acids Res 1993;21 :4886-4892. 

[2]Medline: 99158138. A mammalian protein with specific demethylase activity for mCpG DNA. Bhattacharya SK, 
Ramchandani S. Cetyoni N, Szyf M; Nature 1999;397:579-583. 

[1 696] 745. Peptidase CI 
30 Eukaryotic thbl (cysteine) proteases active sites 

cross-references) THIOL_PROTEASE_CYS; THIOL_PROTEASE_HIS: 
THIOLPROTEASE.ASN 

Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which contain an active site cysteine. 
Catalysis proceeds through a thtoester intermediate and is facilitated by a nearby histidine side chain; an asparagine 
3S completes the essential catalytic triad. The proteases which are currently known to betong to this family are listed bebw 
(references are only provided for recently determined sequences). 

- Vertebrate lysosomal cathepsins B (EC 3.4.22.1), H (EC 3.4.22.16), L (EC 3.4.22.15). and S (EC 3.4.22.27) [2]. 

- Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsin C) [2]. 

40 - Vertebrate calpains (EC 3.4.22.17). Calpains are intracellular calcium- activated thiol protease that contain both 
a N-terminal catalytic domain and a C-terminal cateium-binding domain. 
Mammalian cathepsin K. which seems Involved in osteoclastic bone resorptkxi [3]. 
Human cathepsin O [4]. 

Bleomycin hydrolase. An enzyme that catalyzes the Inactivation of the antitumor drug BLM (a glycopeptide). 
4S - Plant enzymes: barley aleurain (EC 3.4.22.16). EP-B1/B4; kidney bean EP-C1 , rice bean SH-EP; kiwi fmit actinidin 
(EC 3.4.22.14); papaya latex papain (EC 3.4.22.2), chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30), and pro- 
teinase IV (EC 3.4.22.25); pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22.32); rape 
COT44; rice oryzain alpha, beta, and gamma; tomato bw-temperature induced, Arabkk)psis thaliana A494, RD1 9A 
and RD21A. 

so - House-dust mites allergens DerP1 and EurMI . 

Cathepsin B-like proteinases from the worms Caenorhabditis slogans (genes gcp-1 , cpr-3, cpr-4, cpr-5 and cpr- 

6), Schistosoma mansoni (antigen SM31 ) arKi Japonk» (antigen SJ31 ). Haemonchus contortus (genes AC-1 and 

AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 

Slime mo)6 cysteine proteinases CP1 arui CP2. 
ss - Cruzipain from TryparK)soma cruzi and brucei. 

Throphozoite cysteine proteinase (TCP) from varkujs Plasmodium species. 

Proteases from Leishmania mextcana, Theileria annulata and Theileria pan^a. 

- Baculovimses cathepsin-like Enzyme (v-cath). 
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Drosophila smalt optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain. 

- Yeast thbl protease BLH1/YCP1/LAP3. 

Caenorhabditts elegans hypothetical protein C06G4.2, a calpain-like protein. 

[1 697] Two bacterial peptidases are also part of this family: 

Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 

- lh\o\ protease tpr from Porphyromonas gingivalis. 

[1698] Three other proteins are structurally related to this family, but may have k>st their proteolytic activity. 

- Soybean oil body protein P34. This protein has its ^ive site cysteine replaced by a glycine. 

Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced 
by a serine. Rat testin should not be confused with mouse testin which Is a LIM-domain protein (see 
<PDOCa)382>). 

Plasmodium falciparum serine-repeat protein (SERA), the major bkxxi stage antigen. This protein of 11 1 Kd pos- 
sesses a C-terminal thk>l-protease-like domain [6], but the active site cysteine is replaced t>y a serine. 

The sequences around the three active site residues are well conserved and can be used as signature patterns. 
[1 699] Consensus pattemQ-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC]-ISTAGC V] [C is the active site residue] 
Note the residue in position 4 of the pattern is almost always cysteine; the only exceptk>ns are calpains (Leu), bleomycin 
hydrolase (Ser) and yeast YCP1 (Ser). Note the residue in positk)n 5 of the pattern is always Gty except in papaya 
protease IV where rt is Glu. Consensus pattem[LIVMGSTAN]-x-H-{GSACE]-[LIVM]-x-[LIVMAT](2)-G-x-[GSADNH] [H 
is the active site residue] 

Consensus pattem[FYCH]4WI]-[H\n>x-IKRQAG]-N-[S7>W-x(3)-{FYVV]-G-x(2)-G-[LFYV^ [N 
is the active site residue] 

Note these roteins belong to family CI (papain-type) and C2 (calpains) in the ctassifk:ation of peptkJases [Z.EI]. 
[ 1]DutourE. Bkx:himie 70:1335-1342(1988). 

[ 2]Kirschke H., Barrett AJ.. Rawlings N.D. Protein Prof. 2:1587-1643(1995). 

[ 3]Shi G.-P, Chapman H.A. Bhairi S.M., Deleeuw C. Reddy V.Y., Weiss S.J. FEBS Lett. 357:129-134(1995). 
[ 4]VelascoG.. Ferrando A.A., PuenteX.S.. Sanchez LM., Lopez-Otin C. J. Bk>l. Chem. 269:27136-27142(1994). 
[ 5]Chapot-Chartier M.P.. Nardi M.. Chopin M.C., Chopin A. Gripon J.C. Appl. Environ. Mlcrobk>l. 59:330-333 
(1993). 

[ 6]Higgins D.G., McConnell D.J., Sharp PM. Nature 340:604-604(1989). 
[ 7]Rawlings N.D.. Barrett A J. Meth. Enzymol. 244:461 -486(1 994). 

[1700] 746. PeptWase M22 

Glycoprolease family signature cro6s-reference(s) GLYCOPROTEASE 

Glycoprotease (GCP) (EC 3.4.24.57) [1], or o-syatoglycoprotein endopeptidase, is a metalloprotease secreted by Pas- 
teurella haemotytica which specifically cleaves O-siabglycoproteins such as glycophorin A. The sequence of GCP is 
highly similar to the folk>wing uncharacterized proteins: 

Escherichia coli hypothetk:al protein ygjD (ORF-X). 
Bacillus subtilis hypothetical protein ydiE. 
Mycobacterium leprae hypothetical protein U229E. 
Mycobacterium tuberculosis hypothetical protein MtCY78.10. 

- Synechocystis strain PCC 6603 hypothetk:al protein slr0807. 
Methanococcus jannaschii hypothetical protein MJ11 30. 
Haloarcula marismortui hypothetical protein in HSH 3'regk)n. 

- Yeast hypothetkal protein YKR038c. 

- Yeast hypothetk^al protein QRI7. 

[1701] One of the conserved regwns contains two consented histidines. It is possible thai this region is involved in 
coordinating a metal on such as zinc. 

[1702] Consensuspattem(KR]-IGSAT]-x(4)-[FYWLH]-(DQNGK]-x-P-x-[LIVMFY]-x(3)-H-x(2)-[AG]-H-[LIVM] 
Note these proteins bekxig to family M22 in the classification of peptidases [2,E1]. 
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[ 1]Abdultah KM. Lo R.Y.C., Mellors A. J. Bacteriol. 173:5597-5603(1991). 
[ 2]Rawtings N.D.. Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[1703] 747. SAM. SAM domain (Sterile alpha motif) 
s It has been suggested that SAM Is an evolutlonarily conserved protein binding domain that is involved In the regulation 
of numerous developmental processes in diverse eukaryotes. The SAM domain can potentially function as a protein 
interaction module through its ability to homo- and heterooligomerise with other SAM domains. Number of members: 81 

[1]Medllne: 96100659 SAM: A rK>vel motif in yeast sterile alpha and Drosophila polyhomeotic proteins Ponting CP; 
10 Prot Sci 1 995;4: 1 928-1 930. 

[2]Medline: 97160498 SAM as a protein interaction domain involved in developmental regulation. Shultz J, Ponting 
CP, Hofmann K, Bork P; Prot Scl 1997;6:249-253. 

[3]Medline: 99101382 The crystal structure of an Eph receptor SAM domain reveals a mechanism for modular 
dimerization. Reference Author. Stapleton D, Balan I, Pawson T, Sicheri F; Nat Struct Biol 1999;6:44-49. 

IS 

[17041 748. Tyrosinaseslgnaturescross-reference(s)TYROSINASE_1;TYROSINASE_2Tyrosinase(EC 1.14.18.1) 
[1] is a copper nrKxiooxygenases that catalyzes the hydroxylation of monophenols and the oxidation of o-diphenols to 
o-quinols. This enzyme, found in prokaryotes as well as in eukaryotes, Is involved in the formatbn of pigments such 
as melanins and other potyphenolk: compounds. 
20 [1705] Tyrosinase binds two copper ions (CuA and CuB). Each of the two copper k)n has been shown [2] to be bound 
by three consented histkiines resklues. The regions around these copper-binding ligands are well conserved and also 
shared by some hemocyanins. whk:h are copper-containing oxygen carriers from the hemolymph of many molluscs 
and arthropods [3,4]. 

[1706] At least two proteins related to tyrosinase are known to exist in mammals: 

2S 

- TRP-1 (TYRP1 ) [5], whrch is responsible for the converston of 5.6-dihydroxyindole-2-carboxylk: acid (DHICA) to 
irKk>le-5,6-quinone-2-carboxylic ackl. 

TRP-2 (TYRP2) [6], whrch is the melanogenic enzyme DOPAchrome tautomerase (EC 5.3.3.12) that catalyzes 
the converston of DOPAchrome to DHICA. TRP-2 differs from tyrosinases and TRP-1 in that it binds two zinc ions 
30 instead of copper [7]. 

[1707] Other proteins that belong to this family are: 

- Plants polyphenol oxkiases (PPO) (EC 1.10.3.1) which catalyze the oxfcJatton of mono- and o-diphenols to o- 
35 diquinones [8]. 

- Caenorhabditis elegans hypothetrcal protein C02C2. 1 . 

[1708] Two signature patterns for tyrosinase and related proteins have been derived The first one contains two of 
the histkiines that bind CuA, arKi Is located in the N-terminal sectbn of tyrosinase. The second pattern contains a 
40 histidine that binds CuB, that pattem is located in the central sectkxi of the enzyme. 

Consensus pattem H-x(4.5)-F-[UVMFTP]-x-[FWl -H-R-x(2)-[LM]-x(3)-E [The two H's are copper ligands] 
[1 709] Consensus paltemD-P-x-F-[LI VMFYW]-x(2)-H-x(3)-D [H is a copper llgand] 

[ 1]Lerch K. Prog. Clin. 1^1. Res. 256:85-98(1988). 
45 [ 2]Jacknr»n M P. Hajnal A, Lerch K. Bkx:hem. J. 274:707-713(1991). 

[ 3]Linzen B. Naturwissenschaften 76:206-211(1989). 

[ 4]Lang W.H.. van Hokle K.E. Proc. Natl. Acad. Sci. U.S.A. 88:244-248(1991). 

[5]Kobayashi T, Urabe K., Winder A.. Jlmenez-Cen^antes C, Imokawa G., Brewlngton T, Solano R. Garcia- 
Borron J.C.. Hearing V.J. EMBO J. 13:5818-5825(1994). 
SO 1 6]Jackson I.J., Chambers D.M.. Tsukamoto K., Copeland N.G., Gilbert D.J., Jenkins N.A., Hearing V. EMBO J. 

11:527-535(1992). 

[ 7]Solano R, Martinez-Liarte J.H., Jimenez-Cenrantes C, Garcla-Borron J.C., Lozano J.A. Biochem. Blophys. 
Res. Commun. 204:1243-1250(1994). 

[ 8]Cary J.W.. Lax A.R., Flurkey W.H, Plant Mol. Biol. 20:245-253(1992). 
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[1710] 749. (Mur Ligase) Folylpotyglutamate synthase signatures 

Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) [1] is the enzyme of folate metabolism that catalyzes ATP-depend- 
ent addition of glutamate moieties to tetrahydrofolate. 
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[1711] Its sequence is moderately conserved between prokaryotes (gene folC) and eukaryotes. developed two 
signature patterns based on the consenred regions which are rich in glycine residues and could play a role In the 

catalytical activity and/or in substrate binding. 
Description of pattem(s) and/or profile(s) 

Consensus pattern[LI\^FY]-x4LIVM]HSTAGl-G-T4NK]-G-K-x-[STl>x(7)-[LI\^] 
Consensus pattern[LI\^FY](2)-E-x-G4UVMHGA]-G-x(2)-D-x-(GST]-x-{LIVM](2) 

[1712] [ IJShane B., Garrow T. Brenner A.. Chen L. Choi YJ.. Hsu J.C.. Stover R Adv. Exp. Med. Biol. 338:629-634 
(1993). 

[1713] 750. (Peptidase M3) Neutral zinc metallopeptidases, zinc-binding region signature 

The majority of zinc<lependent metallopeptidases (with the notable exception of the cartx>xypeptidases) share a com- 
mon pattern of primary structure [1 ,2,3] in the part of their sequence involved in the binding of zinc, and can be grouped 
together as a superfamily.known as the metzlnctns, on the basis of this sequence similarity. They can be classified into 
a number of distinct families [4,E1] which are listed below along with the proteases which are currently known to belong 
to these families. 
[1714] Family Ml 

Bacterial aminopeptidase N (EC 3.4.11 .2) (gene pepN). 
Mammalian aminopeptidase N (EC 3.4.11.2). 

Mammalian glutamyl aminopeptidase (EC 3.4.11.7) (aminopeptidase A). It may play a role in regulating growth 
and differentiation of early B-lineage cells. 

- Yeast aminopeptidase yscll (gene APE2). 

- Yeast alanlne/^rginine aminopeptkJase (gene AAP1 ). 
Yeast hypothetical protein YIL1 37c. 

- Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is responsible for the hydrolysis of an epoxide moiety of 
LTA-4 to form LTB-4; it has been shown that it binds zinc and Is capable of peptkiase activity. 

[1715] Family M2 

- Angiotensin-converting enzyme (EC 3.4.15.1) (dipeptidyl carboxy peptidase I) (ACE) the enzyme responsible for 
hydrotyzing ang»tensin I to angiotensin II. There are two forms of ACE: a testis-specific isozyme and a somatk: 
isozyme which has two active centers. 

[1716] Family M3 

Thimet oligopeptklase (EC 3.4.24.15). a mammalian enzyme Invoh/ed in the cytoplasms degradatkxi of small 
peptkies. 

Neurolysin (EC 3.4.24.16) (also known as mitochondrial oligopeptidase M or mk;rosomal endopeptkJase). 

- Mitochondrial intermediate peptklase precursor (EC 3.4.24.59) (MIP). It Is Involved the second stage of processing 
of some proteins imported in the mitochorKtrion. 

Yeast saccharolysin (EC 3.4.24.37) (proteinase yscD). 

Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4.15.5) (gene dcp). 

- Escherichia coll and related bacteria oligopeptidase A (EC 3.4.24.70) (gene opdA or prIC). 

- Yeast hypothetk:al protein YKL1 34c. 

[1717] Family M4 

- Thermostable thermolysins (EC 3.4.24.27), and related thermolabile neutral proteases (bacillotysins) (EC 
3.4.24.28) from varbus species of Bacillus. 

- Pseudolysin (EC 3.4.24.26) from Pseudomonas aeruginosa (gene lasB). 
Extracellular elastase from Staphykxoccus epidermidis. 
Extracellular protease prti from Erwinia carotovora. 

Extracellular minor protease smp from Serratia marcescens. 
Vibrblysin (EC 3.4.24.25) from various species of Vibrio. 
Protease prtA from Listeria monocytogenes. 
Extracellular proteinase proA from Legkxiella pneumophila. 

[1718] Family M5 
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- Mycotysin (EC 3.4.24.31 ) from Streptomyces cacaoi. 
[1719] Family M6 

- Immune inhibitor A from Bacillus thuringiensis (gene Ina). I na degrades two classes of insect antibacterial proteins, 
attadns and cecropins. 

[17201 Family ^47 

- Streptomyces extracellular small neutral proteases 
[1721] Family MS 

Leishmanolysin (EC 3.4.24.36) (surface glycoprotein gp63), a cell surface protease from various species of Leish- 
mania. 

[1722] Family M9 

Microbial collagenase (EC 3.4.24.3) from Clostridium perfringens and Vibrio alginolyticus. 

[1723] Family MICA 

Serralysin (EC 3.4.24.40), an extracellular metalloprotease from Serratia. 
Alkaline metailoproteinase from Pseudomonas aeruginosa (gene aprA). 
Secreted proteases A B, C and G from Erwinia chrysanthemi. 
Yeast hypothetical protein YIL108w. 

[1724] Family Ml OB 

Mammalian extracellular matrix metaltoproteinases (known as matrixins) [5]: MMP-1 (EC 3.4.24.7) (interstitial col- 
lagenase), MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). MMP-9 (EC 3.4.24.35) (92 Kd gelatinase), MMP-7 (EC 
3.4.24.23) (matrylisin). MMP-8 (EC 3.4.24.34) (neutrophil collagenase), MMP-3 (EC 3.4.24.17) (stromelysin-1), 
MMP-1 0 (EC 3.4.24.22) (stromelysln-2), and MMP-1 1 (stronr)elysin-3), MMP-1 2 (EC 3.4.24.65) (macrophage met- 
albelastase). 

Sea urchin hatching enzyme (envelysin) (EC 3.4.24.1 2). A protease that allows the embryo to digest the protective 
envelope derived from the egg extracellular matrix. 
Soybean metalbendoproteinase 1 . 

[1725] Family M11 

Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 
[1726] Family M12A 

Astacin (EC 3.4.24.21 ), a crayfish endoprotease. 

Meprin A (EC 3.4.24.18), a mammalian kidney and intestinal brush border metalk)endopeptidase. 

Bone morphogenic protein 1 (BMP-1 ), a protein which induces cartilage and bone formatbn and which expresses 

metalbendopeptidase activity. The Drosophila homolog of BMP-1 is the dorsal-ventral patterning protein toiloki. 

- Blastula protease 10 (BP10) from Paracentrotus lividus and the related protein SpAN from Strongykx:entrotus 
purpuratus. 

Caenorhabditis elegans protein toh-2. 
Caenorhabditis elegans hypothetical protein F42A10.8. 

Choriolyslns L and H (EC 3.4.24.67) (also known as embryonic hatching proteins LCE and HCE) from the fish 
Oryzias lapkles. These proteases participates in the breakdown of the egg envelope, which Is derived from the 
egg extracellular matrix, at the time of hatching. 

[17271 Family M12B 



265 



EP 1 033 405 A2 

- Snake venom metalloprotetnases [6]. This subfamily mostly groups proteases that act in herTK>rrhage. Examples 
are: adamatysin II (EC 3.4.24.46). atrolysin C/D (EC 3.4.24.42), atrolysin E (EC 3.4.24.44). fibrolase (EC 3.4.24.72). 
trimerelysin I (EC 3.4.25.52) and It (EC 3.4.25.53). 

Mouse cell surface antigen MS2. 

[172q Family Ml 3 

Mammalian nepritysin (EC 3.4.24.11) (neutral endopeptidase) (NEP). 

Endothelin-converting enzyme 1 (EC 3.4.24.71 ) (ECE-1 ), which process the precursor of endothelin to release the 
active peptide. 

Kell blood group glycoprotein, a major antigenic protein of erythrocytes. The Kell protein is very probably a zinc 
endopeptidase. 

Peptidase O from Lactococcus lactis (gene pepO). 
[1729] Family M27 

Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins (BoNT). These toxins are 
zinc proteases that block neurotransmitter release by proteolytic cleavage of synaptic proteins such as synapto- 
brevins. syntaxin and SNAP-25 [7,8]. 

[1730] Family M30 

Staphylococcus hyk:us neutral metalbprotease. 

[1731] Family M32 

- Thermostable carboxypeptkiase 1 (EC 3.4.17.19) (carboxypeptkJase Taq), an enzyme from Thermus aquatlcus 
whk:h is most active at high temperature. 

[1732] Family M34 

- Lethal factor (Lp) from Bacillus anthracis. one of the three proteins composing the anthrax toxin. 
[1733] Family M35 

Deuterolysin (EC 3.4.24.39) from Penbiltium citrinum and related proteases from various species of Aspergillus. 

[1734] Family M36 

Extracellular elastirK>lytte m^lk)proteinases from Aspergillus. 

[1 735] From the tertiary structure of thenmolysin. the positton of the resklues acting as zinc ligands and those involved 
in the catatytk: activity are known. Two of the zinc ligands are histidines whk:h are very close together in the sequence; 
C-terminal to the first histidine is a glutamb acid residue which acts as a nucleophile and promotes the attack of a 
water molecule on the cartxxiyl cartxvi of the substrate. A signature pattern whk:h includes the two histidine and the 
glutamic acid residues is sufficient to detect this superfamily of proteins. 
[1736] Description of pattem(s) and/or profile(s) 

Consensus pattem[GSTALIVN]-x(2)-H-E-[U VMFYW]-{DEHRKP)-H-x-lLI VMFYWGSPQ] [The 

two H's are zinc ligands] [E is the active site reskiue] 

Sequences known to be\ong to this class detected by the pattemALL, 

except for members of families M5. M7 anrtd Mil. 

Other sequence(s) detected in SW1SS-PROT55; including Neurospora 

crassa conidiatbn-specifk: protein 13 which couM be a 

zinc-protease. 

[ 1]Jongeneel C.V., Bouvier J.. Bairoch A. 

FEBS Lett. 242:211-214(1989). 

[ 2]Murphy G.J.R. Murphy G.. Reynolds J.J. 
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FEBS Lett. 289:4-7(1991). 

[ 3]Bode W., Grams R. Reinemer P., Gomis-Rueth F.-X. Baumann U., McKay 

D.B-.StoeckerW. 

Zcxrfogy 99:237-246(1996). 
5 [ 4]Rawlings N.D., Barrett A.J. 

Meth. EnzymoK 248:183-228(1995). 

[ SJWoessner J. Jr. 

FASEB J. 5:2145-2154(1991). 

[ 6]Hlte LA., Fox J.W., Bjarnason J.B, 
10 1 7]Montecucco C, Schiavo G. 

Trends Biochem. ScL 18:324-327(1993). 

[ 8]Niemann H.. Blasi J.. Jahn R. 

Trends Cell Bbl. 4:179-185(1994). 

IS [1737] 751. PseudoU_synt_1 

tRNA pseudouridine synthase is involved in the formatbn of pseudouridine at the anticodon stem and loop of transf er- 
RNAs Pseudouridine is an isomer of uridine (5-(beta-D-ribofuranosyl) uracil, and id the most abundant modified nucl- 
eoside found in all cellular RNAs. The TruA-like proteins also exhibit a conserved sequence with a strictly consented 
aspartic acid, likely involved in catalysis. Number of members: 25 

20 [1738] [IJMedline: 98254513. Transfer RNA-pseudouridine synthetase Pusi of Saccaromyces cerevisiae contains 
one atom of zinc essential for its native confomnation and tRNA recognitbn. Arluison V, Hountondji C, Robert B. Gros- 
jean H; Bkxhemtstry 1998;37:7268-7276. 
[1739] 752. EPSP synthase signatures 

EPSP synthase (3-phosphoshikimate 1 -carboxyvlnyltransferase) (EC 2.5. 1.19) catalyzes the sixth step in the biosyn- 
^ thesis from chorismate of the aromatic amino acids (the shikimate pathway) in bacteria (gene aroA), plants and fungi 

(where it is part of a multif unctkyial enzyme which catalyzes five consecutive steps in this pathway) [1 ]. EPSP synthase 

has been extensively studied as it is the target of the potent herb»ide glyphosate which inhibits the enzyme. 

[1740] The sequence of EPSP from varfous biological sources shows that the structure of the enzyme has been well 

cor^served throughout evolution. Two conserved regbns were selected as signature patterns. The first pattern corre- 
30 sponds to a regkjn that is part of the active site and which is also important for the resistance to glyphosate [2]. The 

second pattern is kx:ated in the C-terminal part of the protein and contains a consen/ed lysine which seems to be 

Important for the activity of the enzyme. 

[1741] Description of pattem(s) and/or profile(s) 

[1742] Consensus pattem[UVMJ-x(2)-[GN]-N-[SA]-G-T-[STA]-x-R-x-[LIVMY]-x-[GSTAl 
3S Consensuspatlem[KRJ-x-IKHJ-E-[CSTI-pNE]-R-[LIVMJ-x-[STA]-[LIVM^^^ 

[ 1]Stallings W.C., Abdel-Megtd S.S., Lmi L.W., Shieh H.-S., Dayringer H.E.. Leimgruber N.K., Stegeman R.A., 
Anderson K-S., Sikorski J.A., Padgette S.R., Kishore G.M. Proc. Natl. Acad. Sci. U.S.A. 88:5046-5050(1991). 
[ 2]Padgette S.R., Re D.B., Gaser C.S., Echoltz D.A., Frazier R.B., HIronaka CM., Levine E.B., Shah D.M., Fraley 
40 R.T.. Kishore G.M. J. BkA. Chem. 266:22364-22369(1991). 

[1743] 753. GlycoJiydfO_18 

Glycosyl hydrolases family 18. Number of members: 173 

[1]Medline: 95219379. Crystal structure of a bacterial chitinase at 2.3 A resolutbn. Perrakis A, Tews I, Dauter Z, Op- 
45 penhelm AB. Chet I. Wilson KS, >forgias CE; Stmcture 1994;2:1169-1180. 
[1744] 754. Esterase 
Putative esterase 

This family contains Esterase D Swiss:P10768. However it is not clear if all members of the family have the same 
f unc^nn. This family is possibly related to the COesterase family. 
so Number of members: 36 

[1745] 755. (HMA) Heavy-metal-associated domain 

A consented domain of about 30 amino ackJ residues has been found [1] in a number of proteins that transport or 
detoxify heavy metals. This domain contains two consented cysteines that couki be Involved in the binding of these 
metals. The domain has been temned Heavy-Metal-Associated (HMA). It has been found in: 

55 

- A variety of cation transport ATPases (E 1 -E2 ATPases) (see <PDOC001 39>). The human copper ATPAses ATP7A 
and ATP7B whk:h are respectively Involved in Menke's and Wilson's diseases. ATP7A and ATP7B both contain 6 
tandem copies of the HMA domain. The copper ATPases CCC2 from budding yeast. copA from Enterococcus 
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faecalis and synA from Synechocxxxus contain one copy of the HMA domain. The cadmium ATPases cadA from 
Bacillus fimnus and from plasmid pt258 from Staphylococcus aureus also contain a single HMA domain, while a 
chromosomal Staphylococcus aureus cad A contains two copies. Other, less characterized ATPases that contain 
the HMA domain are: fixl from Rhizobium meliloti, pacS from Synechococcus strain PCC 7942). Mycobacterium 
5 leprae ctpA and ctpB and Escherichia coli hypothetical protein yhhO. In all these ATPases the HMA domain(s) are 

located in the N-terminal section. 

Mercuric reductase (EC1 . 1 6. 1 . 1 ) (gene mer A) which is generally encoded by plasmids carried by mercury-resistant 
Gram-negative bacteria. Mercuric reductase is a class-1 pyridine nucleotide-disulphide oxidoreductase (see 
<PDOC00073>). There is generally one HMA domain (with the exception of a chromosomal merA from Bacillus 
10 strain RC607 which has two) in the N-terminal part of merA. 

- Mercuric transport protein periplasmic component (gene merP), also encoded by plasmids carried by mercury- 
resistant Gram-negative bacteria. It seems to be a mercury scavenger that specifically binds to one Hg(2+) ion 
and which passes it to the mercuric reductase via the merT protein. The N4ermlnal half of merP is a HMA domain. 
Helicobacter pybri copper-binding protein copR 

IS - Yeast protein ATX1 [2], which could act In the transport and/or partitioning of copper. 

[1746] The consensus pattern for HMA spans the complete domain. 
[1747] Description of pattern(s) and/or profile(s) 

Consensus pattem[LI VN]-x(2)-[LI VMFA]-x-C-x-[STAGCDNH]-C-x(3)-[LI VFG]-x(3)-[LI V]-x(9, 1 1 )-[l VA]-x-(LVFYS] [The 
^ two C's probably bind metals] 

[ 1]Bull RC. Cox D.W. Trends Genet. 10:246-252(1994). 

[2]Un S.^., Culotta V.L Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 

25 [1748] 756. (Peptidase MID) Matrixins cysteine switch 
PROSITE cross-reference(s): CYSTEINE_SWITCH 

Mammalian extracellular matrix metalloproteinases (EC 3.4.24.-). also known as matrixins [1] (see <PDOC00129>), 
are zinc-dependent enzymes. They are secreted by cells in an inactive form (zymogen) that differs from the mature 
enzyme by the presence of an N-terminal propeptide. A highly conserved octapeptide is found two residues downstream 
30 of the C-terminal end of the propeptide. This region has been shown to be involved in autoinhlbitlon of matrixins [2,3]; 
a cysteine within the octapeptide chelates the active site zinc Ion, thus inhibiting the enzyme. This region has been 
called the 'cysteine switch' or *autoinhibitor region*. 
A cysteine switch has been found in the following zinc proteases: 

35 - MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysin-1 ). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 
40 - MMP-9 (EC 3.4.24.35) (92 Kd gelatinase). 

- MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP-11 (EC 3.4.24.-) (stromelysln-3). 

- MMP-1 2 (EC 3.4.24.65) (macrophage metalbelastase). 

- MMP-1 3 (EC 3.4.24.-) (collagenase 3). 

45 - MMP-1 4 (EC 3.4.24.-) (membrane-type matrix metalllproteinase 1). 

MMP-15 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 
MMP-1 6 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 

- Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

so 

[1 749] Description of pattern(s) arxVor profile(s) 

Consensus pattemP-R-C-[GN]-x-P-[DR]-[U VSAPKQ] [C chelates the zinc ion] 

[ IjWoessner J. Jr. FASEB J. 5:2145-2154(1991). 
55 [ 2]Sanchez-Lopez R.. Nicholson R.. Gesnel M.C.. Matrisian LM., Breathnach R. J. Biol. Chem. 263:11892-11899 

(1988). 

[ 3]Park A.J.. Matrisian L.M., Kells A.F.. Pearson R, Yuan Z., Navre M. J. Biol. Chem. 266:1584-1590(1991). 
[ 4]Lepage T. Gache C. EMBO J. 9:3003-3012(1 990). 
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[ 5]Klnoshrta T, Fukuzawa H.. Shimada T, Saito T. Matsuda Y. Proc. Natl. Acad. Sci. U.S.A. 89:4693-4697(1 992). 
[1750] 757. (Peptidase S8) Serine proteases, subtilase family, active sites 

PROSITE cross-reference(s): PS00136; SUBTILASE_ASP. PS00137; SUBTILASE^HIS, PS00138; SUBTILASE^SER 
s Subtllases [ 1 , 2] are an extensive fam ily of serine proteases whose catalytic activity is provided by a charge relay system 
similar to that of the trypsin family of serine proteases but which evolved by independent convergent evolution. The 
sequence around the residues involved in the catalytic triad (aspartic acid, serine and histidlne) are completely different 
from that of the analogous residues in the trypsin serine proteases and can be used as signatures specific to that 
category of proteases. 
10 The subtilase family currently includes the following proteases: 

- Subtllislns (EC 3.4.21 .62). these alkaline proteases from various Bacillus species have been the target of numerous 
studies in the past thirty years. 

Alkaline elastase YaB from Bacillus sp. (gene ale). 
IS - Alkaline serine exoprotease A from Vibrio alginolyticus (gene proA). 
Aqualysin I from Thermus aquaticus (gene psti). 
AspA from Aeromonas salmonicida. 

Bacillopeptidase F (esterase) from Bacillus subtilis (gene bpf). 

C5A peptidase from Streptococcus pyogenes (gene scpA). 
20 - Cell envelope-located proteases PI, Pll, and Pill from Lactococcus lactis. 

Extracellular serine protease from Serratia marcescens. 

Extracellular protease from Xanthomonas campestris. 

Intracellular serine protease (ISP) from varbus Bacillus. 

Minor extracellular serine protease epr from Bacillus subtilis (gene epr). 
2S . Minor extracellular serine protease vpr from Bacillus subtilis (gene vpr). 

Nisin leader peptide processing protease nisP from Lactococcus lactis. 

Serotype-specific antigene 1 from Pasteurella haemolytica (gene ssal). 

- Thermitase (EC 3.4.21 .66) from ThemrK>actinomyces vulgaris. 
Calcium-dependent protease from Anabaena variabilis (gene prcA). 

30 - Halolysin from halophilic bacteria sp. 172p1 (gene hly). 

- Alkaline extracellular protease (AEP) from Yarrowia lipolytica (gene xpr2). 
Alkaline proteinase from Cephak>8portum acremonium (gene alp). 

- Cerevtsin (EC 3.4.21 .48) (vacuolar protease B) from yeast (gene PRB1). 

- Cuticle-degrading protease (pri) from Metarhizium anisopliae. 
3S - KEX-1 protease from Kluyveromyces lactis. 

- Kexin (EC 3.4.21 .61 ) from yeast (gene KEX-2). 

Oryzin (EC 3.4.21 .63) (alkaline proteinase) from Aspergillus (gene alp). 

- Proteinase K (EC 3.4.21 .64) from Tritirachium album (gene proK). 
Proteinase R from Tritirachium album (gene proR). 

40 - Proteinase T from Tritirachium album (gene proT). 

Subtilisin-like protease III from yeast (gene YSP3). 

- Thermomycolin (EC 3.4.21 .65) from Malbranchea sutfurea 

- Furin (EC 3.4.21 .85). neuroendocrine convertases 1 to 3 (NEC-1 to -3) and PACE4 protease from mammals, other 
vertebrates, and invertebrates. These proteases are involved in the processing of homnone precursors at sites 

45 comprised of pairs of basic amino acid resklues [3]. 

- TripeptkJyI-peptidase II (EC 3.4.14.10) (tripeptidyl aminopeptidase) from Human. 

Prestalk-spectTic proteins tagB and tagC from slime moki [4]. Both proteins consist of two domains: a N4erminal 
subtilase catalytic domain and a C-terminal ABC transporter domain (see <PDOC00185>). 

so [1751] Description of pattem(s) and/or profile(s) 

Consensus pattemISTAIVl-x-[LIVMF]-[LIVM]-D-[DSTA]-G-{LIVMFC]-x(2,3)-[DNH] [D is the active site residue] 
Consensus pattemH-G-[STM]-x-[VIC]-ISTAGC]-IGS]-x-[LIVMA]-[STAGCLVHSAGM] [H is the active site reskJue] 
Consensus pattemG-T-S-x-[SA]-x-P-x(2)-[STAVC]-{AG] [S is the active site residue] 

Note if a protein Includes at least two of the three active site signatures, the probability of it being a serine protease 
ss from the subtilase family is 1 00% 

Note these proteins bekxig to family S8 in the classification of peptidases [5,E1]. 

[ 1]Siezen R.J.. de Vos W.M., Leunissen JAM., Dijkstra B.W. Protein Eng. 4:719-737(1991). 
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[ 2]Siezen R.J. (tn) Proceeding subtilisin symposium, Hamburg. (1992). 
[ 3]Barr P.J. Ceil 66:1*3(1991). 

[ 4]Shaulsky G.. Kuspa A., Loomis W.F.; Genes Dev. 9:1111-1122(1995). 
[ 5]Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

5 

[1752] 758. (SSB) Single-strand binding protein family signatures 
PROSITE cross-reference(s): PS00735: SSB_1 ,PS00736: SSB_2 

The Escherichia coli single-strand binding protein [1] (gene ssb), also known as the helix-destabitizing protein, is a 
protein of 177 amino acids. It binds tightly, as a homotetramer, to single-stranded DNA (ss-DNA) and plays an important 

10 role in DNA replication, recombination and repair. 

[1753] Closely related variants of SSB are encoded in the genome of a variety of large setf-transmissible plasmids. 
SSB has also been characterized in bacteria such as Proteus mirabilis or Serratia marcescens. 
[1 754] Euliaryotic mitochondrial proteins that bind ss-DNA and are probably involved in mitochondrial DNA replication 
are structurally and evolutionary related to prokaryotic SSB. Proteins cunently known to belong to this subfamily are 

IS listed bek>w [2]. 

- Mammalian protein Mt-SSB (P16). 

- Xenopus Mt-SSBs and Mt-SSBr. 

- Drosophila MtSSB. 
20 . Yeast protein RIM1. 

[1755] Two signature patterns have been developed for these proteins. The first is a consented region in the N- 
terminal section of the SSB's. The second is a centrally located regbn which, in Escherichia coll SSB, is known to be 
involved in the binding of DNA. 
2S [1756] Description of pattern(s) and/or profile(s) 

Consensus pattem[LIVMn4NST]KKRT14LIVM]-x4UVMFl(2)-G-[NHRK]-[LIVM]-[GSTl-x-[DE^ 
Consensus pattemT-x-W-[HYHRNSHUVMJ-x-(LIVMF]-{FYl-[NGKR] 

[ 1]Meyer R.R., Laine P.S. Microbiol. Rev. 54:342-380(1990). 
30 1 2]Slroumbakis N.D.. U Z, Tolias RR Gene 143:171-177(1994). 

[1757] 759. KDPG and KHG akJolases active site signatures 

PROSITE cross-reference(s): PS00159; ALDOLASE_KDPG_KHG_1 , PS00160; ALDOLASE_KDPG_KHG_2 
[1758] 4-hydroxy-2-oxoglutarate aldolase (EC 4.1 .3.16) (KHG-aldolase) catalyzes the interconversion of 4-hydroxy- 
35 2-oxoglutarate into pyruvate and glyoxylate. Phospho-2-dehydro-3-deoxygluconate aldolase (EC 4.1.2.14) (KDPG- 
aldolase) catalyzes the interconverskxi of 6-phospho-2-dehydro-3-deoxy-D-gluconate into pyruvate and gtyceralde- 
hyde 3-ph05phate. 

[1759] These two enzymes are structurally and furu:tk>nally related [1]. They are both honrK>trimeric proteins of ap- 
proximately 220 amino-ackJ reskJues. They are class I akiolases whose catalytk: mechanism involves the formatbn 
40 of a Schiff-base intermediate between the substrate and the epsilon-amino group of a lysine residue. In both enzymes, 
an arginine is required for catalytk: activity. 

[1 760] Two signature patterns were devetoped for these enzymes. The first one contains the active site arginine and 

the second, the lysine involved in the Schrff-t)ase formation. 

[1761] Description of pattern(s) and/or profiie(s) 
45 Consensus patternG-[LIVM]-x(3)'E-tLIV]-T-[LF]-R [R is the active site reskJue] Consensus pattemG-x(3)-[LIVMF]-K- 

[LF]-F-P-{SA]-x(3)-G [K is involved in Schiff-base formation] 

[1762] [ 1] Vlahos C J.. Dekker E.E. J. Bk>l. Chem. 263:11683-11691(1988). 

[1763] 760. AP endonucleases family 1 signatures. PROSITE cross-reference(s): PS00726; 

AP_NUCLEASE_F1_1, PS00727: AP_NUCLEASE_F1_2. PS00728; 
SO AP_NUCLEASE_F1_3 

[1764] DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those that generate 
oxygen radk:als produce a variety of lesions in DNA. Amongst these is base-loss whk:h forms apurink^/apyrimklink: 
(AP) sites or strand breaks with atypnal 3'termini. DNA repair at the AP sites is initiated by specific endonuclease 
cleavage of the phosphodiester backbone. Such endonucleases are also generally capable of removing blocking 
ss groups from the 31erminus of DNA strand breaks. 

[1765] AP erKionucleases can be classified into two families on the basis of sequence similarity. Family 1 groups 
the enzymes listed below [1]. 
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Escherichia coii exonuclease III (EC 3. 1 . 11 .2) (gene xthA). 
Streptococcus pneumoniae and Bacillus subtilis exonuclease A (gene exoA). 
Mammalian AP endonuclease 1 (AP1 ) (EC 4.2,99,18). 
Drosophila recombination repair protein 1 (gene Rrpi). 
5 - Arabidopsis thaliana apurinic endonudease-^edox protein (gene arp). 

[1766] Except for Rrpi and arp, these enzymes are proteins of about 300 amino-acid residues. Rrpi and arp both 
contain additional and unrelated sequences in their N-terminal section (about 400 residues for Rrp1 and 270 for arp). 
[1767] Three signature pattems were developed for this family of enzymes. The patterns are based on the most 
10 consented regions. The first pattern contains a glutamate which has been shown [2], in the Escherichia coli enzyme 
to bind a divalent metal ion such as magnesium or manganese 

[176q Consensus pattem[APF]-D-[UVMF](2)-x-[LIVM]-0-E-x-K [E binds a divalent metal ion] 
Consensus patternD-{STHFY]-R-[KH}-x(7,8)-[FYW]-tST]-[FYW](2) 
Consensus pattemN-x-G-x-R-[LIVM]-D-{LIVMFYH]-x-[LV]-x-S 

15 

[ 1] Barzilay G.. Hickson I.S. BioEssays 17:713-719(1995). 

[ 2] Mol C.D„ Kuo C.-F., Thayer M.M., Cunningham R.P, Tarner J.A. Nature 374:381-386(1995). 

[1769] 761. (ER)Enhancer of rudimentary signature, PROSITE cross-reference(s): PS01290; ER 
20 [1 770] Ttie Drosophila protein 'enhancer of rudimentary (gene (e(r)) is a small prote in of 1 04 residues whose function 

is not yet clean From an evolutionary point of view, it is highly consen/ed [1] and has been found to exist in probably 

alt multicellular eukaryotic organisms. It has been proposed that this protein plays a role in the cell cycle. 

[1771] A conserved region in the central part of the protein was selected as as signaure pattern. 

[1772] Consensus pattern Y-D-l-ISA]-x-L-[FY]-x-F-[IV]-D-x(3)-D-[LIV]-S 
2S [1773] [ 1] Gelsthorpe M., Pulumati M., McCallum C, Dang-Vu K., Tsubota S.I. Gene 186:189-195(1997). 

[1774] 762. (ETF alpha) Electron transfer flavoprotein alpha-subunit signature. PROSITE cross-reference(s): 

PS00696; ETF_ALPHA 

[1 775] The electron transfer flavoprotein (ETF) [1 ,2] senses as a specific electron acceptor for various mitochondrial 
dehydrogenases. ETF transfers electrons to the main respiratory chain via ETF-ubiquinone oxidoreductase. ETF is an 
30 heterodimer that consist of an alpha and a beta subunit and which bind one molecule of FAD per dimer. A similar 
system also exists in some bacteria 

[1770] The alpha subunit of ETF is a protein of about 32 Kd which is structurally related to the bacterial nitrogen 
fixation protein fixB which could play a role in a redox process and feed electrons to f erredoxin. 
[1 777] Other related proteins are: 

3S 

Escherichia coli hypothetical protein ydlR. 
Escherichia coli hypotheticai prtstein ygcQ. 

[1778] A highly consen/ed region which is located in the C-termina! section was selected as a signature pattern for 
40 these proteins. 

[1 779] Consensus pattern [LI]-Y-{LI VM)-[ATi-x-G-[l Vl-{SD]-G-x-[l Vl-Q-H-x(2)-G-x(6)-[l V]-x-A-[l V]-N 

[ 1] Finocchiaro G., Ikeda Y. Ito M.. Tanaka K. Prog. Clin. Bbl. Res. 321:637-652(1990). 
[ 2] Tsai M.H., Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

45 

[1780] 763. (lectin c) C-type lectin domain signature and profile 

PKOSITE cross-reference(s): PS00615; C_TYPE_LECTIN_1, PS50041; C_TYPE_LECTIN_2 
[1781] A number of different families of proteins share a consen/ed donnain which was first characterized In some 
animal lectins and which seem to functkxi as a calciurrKlependent cartx^hydrate-recognition domain [1 ,2.3]- This do- 
so main, which is known as the C-type lectin domain (CTL) or as the carbohydrate-recognition domain (CRD), consists 
of about 110 to 130 reskJues. There are four cysteines which are perfectly conserved and involved in two disulfide 
bonds. A schematk: representatkyi of the CTL domain Is shown below 



55 



271 



EP 1 033 405 A2 



+ + 

I I 



xcxxxxcxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxWxCx 
xxxCx 

H + + + 



'C: conserved cysteine involved in a disulfide bond, 
'c': optional cysteine involved in a disulflde bond. 
•**: position of the pattern. 



[1782] The categories of proteins, in which the CTL domain has been found, are listed below. 

[1783] Type-ll membrane proteins where the CTL domain is located at the C-tenminal extremity of the proteins: 

- Asialoglycoprotein receptors (ASGPR) (also known as hepatic lectins) [4]. The ASGPR's mediate the endocytosis 
of plasma glycoproteins to which the terminal sialic acid residue in their cartx>hydrate moieties has been removed. 
Low affinity immunoglobulin epsilon Fc receptor (lymphocyte IgE receptor), which plays an essential role in the 
regulation of IgE production and in the differentiation of B cells. 

Kupffer cell receptor. A receptor with an affinity for galactose and f ucose, that could be involved in endocytosis. 

- A number of proteins expressed on the surface of natural killer T-cells: NKG2, NKR-P1. YE1/88 (Ly-49), GD69 
and on B-celts: CD72. LyB-2. The CTL- domain in these proteins is distantly related to other CTL-domains; it is 
unclear whether they are likely to bind carbohydrates. 

[1784] Protem that consist of an N-tenminal collagenous domain folbwed by a CTL- domain [5], these proteins are 
sometimes called 'collectins': 

- Pulmonary surfaciant-associated protein A(SP-A). SP-A is a cateium-dependent protein that binds to surfactant 
phospholipkis and contributes to lower the surface tension at the air-liquki Interface in the alveoli of the mammalian 
lung. 

- Pulmonary surfactant-associated protein D (SP-D). 

- Conglutinin, a calcium-dependent lectin-like protein whk:h binds to a yeast cell wall extract and to immune com- 
plexes through the complement component (iC3b). 

Mannan-binding proteins (MBP) (also known as mannose-binding proteins). 
MBPs bind mannose and N-acetyhD-glucosamine in a cateium-dependent 
manner. 

- Bovine collectin-43 (CL-43). 

[1 785] Selectins (or LEC-CAM) [6,7]. Selectins are cell adheson molecules implicated in the interaction of leukocytes 
with platelets or vascular endothelium. Structurally, selectins consist of a bng extracellular domain, followed by a 
transmembrane regkxi and a short cytoplasmic domain. The extracellular domain is itself composed of a CTL-domain, 
folk3wed by an EGF-like domain and a variable number of SCR/Sushi repeats. Known selectins are: 

Lymph node homing receptor (also known as L-selectin. leukocyte adhesion 
molecule-1. (LAM-1). leu-8, gp9(Hnei. or LECAM-1) 
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- Endothelial leukocyte adhesion molecule 1 (ELAM-1 , E-selectin or LECAM-2). 

[1786] The ligand recognized by ELAM-1 is sialyl-Lewls x. 

s - Granule membrane protein 140 (GMP-140, P-selecttn. PADGEM. CD62, or LEGAM- 
3). The ligand recognized by GMP-140 is Lewis x. 

[1787] Large proteoglycans that contain a CTL-domain followed by one copy of a SCR/ Sushi repeat, in their C- 
terminal section: 

10 

' Aggrecan (cartilage-specific proteoglycan core protein). This proteoglycan is a major component of the extracellu lar 
matrix of cartilagenous tissues where it has a role in the resistance to compression. 

Brevican. 
Neurocan. 

IS . Versican (large fibroblast proteoglycan), a large chondroitin sulfate proteoglycan that may play a role in Intercellular 
signalling. 

[1788] In addition to the CTL and Sushi domains, these proteins also contain, in their N4ermlnal domain, an Ig-like 
V-type region, two or four link domains (see <PDOG00955>) and up to two EGF-like repeats. 
20 [1 789] Two type-l membrane proteins: 

Mannose receptor from macrophages. This protein mediates the endocytosis of 

glycoproteins by macrophages in several recognition and uptake processes, its extracellular sectk>n consists of a 
fibronectin type II domain followed by eight tandem repeats of the CTL domain. 
2S - 180 Kd secretory phospholipase A2 receptor (PLA2-R). A protein whose 
structure is highly similar to that of the mannose receptor. 

DEC-205 receptor. This protein is used by dendritk; cells and thymrc epithelial cells to capture and endocytose 
diverse carbohydrate-binding antigens and direct them to antigen-processing cellular compartiments, DEC-205 
extracellular section consists of a fibronectin type II domain followed by ten tandem repeats of the CTL domain. 
30 ' Silk moth hemocytin, an humoral lectin whk^h is involved in a self-defence mechanism. It is composed of 2 FA58C 
domains (see <PIXX;00988>). a CTL 

domain. 2 VWFC domains (see <PDOC00928). and a CTCK (see <PDOC00912>). 
[1790] VSanous other proteins that unquely consist of a CTL domain: 

35 

- Invertebrate soluble galactose-binding lectins. A category to whfch belong a humoral lectin from a flesh fly; echi- 
noidin. a lectin from the coelomk: fluki of a sea urchin; BRA-2 and BRA-3. two lectins from the coekxnk: fluid of a 
barnacle, a lectin from the tunk:ate Poiyandrocarpa misakiensis and a newt ovkluct lectin. The physiobgical im- 
portance of these lectins is not yet known but they may play an important role in defense mechanisms. 

40 - Pancreatic stone protein (PSP) (also known as pancreatk; thread protein (PTP), or reg), a protein that might act 
as an inhibitor of spontaneous calcium carbonate precipitatk>n. 

- Pancreatitis associated protein (PAP), a protein that might be involved in the control of bacterial proliferatkm. 
Tetranectin, a plasma protein that birKls to plasminogen and to isolated kringle 4. 

Eosinophil granule major baste protein (MBP), a cytotoxic protein. 
45 . A galactose spectfk; lectin from a rattlesnake. 

Two subunits of a coagulation factor IX/factor X-binding protein (IX/X-bp), a snake venom antboagulant protein 
which birxis with factors IX and X in the presence of calcium. 

- Two subunits of a phospholipase A2 inhibitor from the plasma of a snake (PLI-A and PLI-B). 

- A lipopolysaccharkte-binding protein (LPS-BP) from the hemolymph of a 
so cockroach [8]. 

Sea raven antifreeze protein (AFP) [9]. 

[1791] As a signature pattern for this domain, the C-tenminal regkxi with its three conserved cysteines was selected. 
[1792] Consensus pallemC-{UVMFYATG]-x(5.12)-[WL]-x-(DNSR]-x(2)-C-x(5,6)-[FYWLIVSTA]-[LIVMSTA]-C [The 
ss three C's are involved in disulfide txxids] 

Note all CTL domains have five Trp reskJues before the second Cys. with the exceptbn of tunk:ate lectin and cockroach 
LPS-BP which have Leu. 

Note this documentation entry is linked to both a signature pattern and a profile. As the profile is much more sensitive 
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than the pattern, you should use it if you have access to the necessary software tools to do so. 

[ 1] Drickamer K. J, Kof. Chem. 263:9557-9560(1988). 
1 2] Drickamer K. Prog. Nucleic Acid Res. Mol. Bkol 45:207-232(1993). 
5 [ 3] Drickamer K. Curr. Opin. Stnict. Biol. 3:393-400(1993). 

[ 4] Spiess M. Bkx:hemistry 29:10009-10018(1990). 

[ 5] Weis W.I.. Kahn R, Foumne R., Drickamer K., Hendrickson W.A. Science 254:1608-1615(1991). 
[ 6] Siegelman M. Curr. Biol. 1:125-128(1991). 
[ 7] Lasky LA. Science 238:964-969(1992). 
10 1 8] Jomori Natori S. J. Biol. Chem. 266:13318-13323(1991). 

[ 9) Ng N.F.L, Hew C.-L J. Biol. Chem. 267:16069-16075(1992). 

[1793] 764. (SRCR) Speract receptor repeated domain signature 

PROSITE cross-reference(s): PS00420; SPERACT_RECEPTOR, 
15 [1794] The receptor for the sea urchin egg peptide speract is a transmembrane glycoprotein of 500 amino acki 

residues [1]. Structurally it consists a large extracellular domain of 450 reskiues, folbwed by a transmembrane 

region and a small cytoplasmk: domain of 12 amino ackis. The extracellular domain contains four repeats of a 115 

amino ackis donnain. There are 17 positions that are perfectly consented In the four repeats, among them are six 

cysteines, six glycines, and three glutamates. 
20 [1795] Such a domain is also found, once, in the C-terminal section of mammalian macrophage scavenger receptor 

type I [2], a membrane glycoproteins innplicated in the pathotogic deposition of cholesterol in arterial walls during athero- 

genesis. 

[1 796] The signature pattern that was derived spans part of the N-terminal sectk>n of the domain and contains 8 of 
the 17 consen/ed reskiues. 
2S [1797] Consensus pattemG-x(5)-G-x(2)-E-x(6).W-G-x(2)-C-x(3)-[FYW|-x(8)-C-x(3)-G 

[ 1] Dangott J.J., Jordan J.E.. Bellet R.A., Garbers D.L Proc. Natl. Acad. Sci. U.S.A. 86:2128-2132(1989). 

[ 2] Freeman M., Ashkenas J., Rees D.J., Kingsley D.M., Copeland N.G., Jenkins N.A., Krieger M. Proc. Natl. 

Acad. Sci. U.S.A. 87:8810-8814(1990). 

30 

[1798] 765. Bac_8urface_Ag 
Bacterial surface antigen 

This entry includes the following surface antigens; D1 5 antigen from H.influenzae, OMA87 from P.muttocida, OMP85 
from N.meningitkJis and N.gonorrhoeae. Number of members: 14 

35 

[1]Medline: 95255676. The sequencing of the 80-kDa D15 protective surface antigen of Haemophilus influenzae. 
Flack FS. Loosnrxxe S. Chong R Thomas WR; Gene 1995;156:97-99. 

[2] Medline: 96333354. Ckxiing, sequencing, expression, and protective capacity of the oma87 gene encoding the 
Pasteurella multocrda 87-kikxialton outer membrane antigen. Ruffok) CG, Adier B; Infect Immun 1996;64: 
40 3161-3167. 

[17991 766. BRCA1 C Terminus (BRCT) domain 

The BRCT domain is found predominantly in proteins Involved in cell cycle checkpoint functk)ns responsive to DNA 
damage. It has been suggested that the Retinoblastoma protein contains a divergent BRCT domain, this has not been 
45 included in this family. The BRCT domain of XRCC1 forms a homodlmer in the crystal structure Medline:99016060. 
This suggests that pairs of BRCT donrvains 
associate as homo- or heterodimers. Number of members: 131 

[1] Medline: 96259550. BRCA1 protein products ...Functional motifs... Koonin EV, Altschul SF, Bork P; Nature 
so Genet 1 996; 1 3266-268. 

[2] Medline: 97153217. From BRCA1 to RAP1: A wkJespread BRCT module closely associated with DNA repair 
Callebaut I, Momon JP; Febs lett ig97;400:25-30. 

[3] Medline: 97166552. A superfamily of consented domains in DNA dannage responsive cell cycle checkpoint 
proteins Bork P, Hofmann K, Bucher P, Neuwald AF, Altschul SF, Koonin EV; Faseb J 1997;11:68-76. 
ss [4] Medline: 97402527. Gapped BLAST and PSI-BLAST: a new generatbn of protein database search programs. 

Altschul SF. Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W. LiprDan DJ; Nuclei Acids Res 1997;25: 
3389-3402. 

[5] Medline: 99016060. Structure of an XRCC1 BRCT domain: a new protein-protein interactkxi rrKxIule. Zhang 
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X, Morera S. Bates PA. Whitehead PC, Coffer At. Hainbucher K. Nash RA. Sternberg MJ. Lindahl T. Freemont PS; 
[1800] 767. Kappa casein 

Kappa-casein is a mammalian milk protein involved in a number of important physiological processes. In the gut, the 
s ingested protein is split Irrto an insoluble peptide (para kappa-casern) and a soluble hydrophilk: glycopeptkJe (casei- 
rK>macropeptide). 

Caseinomacropeptkie is responsible for increased effk:iency of digestkyi, prevention of neonate hypersensitivity to 
ingested proteins, and inhibrtkxi of gastrk; pathogens. Number of members: 56 

[1801] [1] Medline: 98072500. Nucleotide sequence evolution at the kappa-casein locus: evidence for positive se- 
10 lection within the family Bovklae. Ward TJ. Honeycutt RL, Derr JN; Genetics 1997;147:1863-1872. 
[1 802] 768. Chitinases family 1 8 active site 
PROSITE cross-reference(s) CHITINASE_18 

Chitinases (EC 3.2.1.14) [1] are enzymes that catalyze the hydrolysis of the beta-1 ,4-N-acetyl-D-gIucosamine linkages 
in chitin polymers. From the view point of sequence similanty chitinases belong to either family 18 or 1 9 in the classi- 
cs ficatkx) of glycosyl hydrolases [2,E1 ]. Chitinases of family 18 (also known as classes III or V) groups a variety of proteins: 

a) Chitinases from: 

Prokaryotes such as Alteromonas, Bacillus, Serratia, Streptomyces, etc. 
20 - Plants such as Arabidopsis, cucumber, bean, tobacco, etc. 

Fungi such as Aphanocladium. Rhizopus, Saccharomyces, etc. 

- Nematode (Brugia malayi). 
Insects (Manduca sexta). 

Baculoviruses (Autographa Californk:a Nuclear Polyhedrosis virus). 

25 

b) Other proteins: 

Hevamine. a rubber tree protein with chitinase and lysozyme activities. 
Kluyveromyces lactis killer toxin alpha subunit, which acts as a chitinase. 
30 - Flavobacterium and Streptomyces endo-beta-N-acetylglucosaminidases (EC 3.2,1 .96). 

Mammalian di-N-acetylchitobiase which is involved in the degradation of asparagine-tinked glycoproteins. 

- Human cartilage glycoprotein Gp-39. 

- Jack bean concanavalin B (conB). a protein that has lost its catalytk: activity 

3S [1 803] Site directed mutagenesis experiments [3] and crystallographic data [4,5] have shown that a consented gluta- 
mate is involved in the catalytic mechanism and probably acts as a proton dorK>r. This glutamate is at the extremity of 
the best consented region in these proteins. 

[1804] Consensus pattem(LIVMFYHDN]-G-[LIVMFh[DNHUVMFHDN]-x-E [E is the active site reskiue] 

40 1 1] Flach J., Pilet P-E., Jolles P Experientia 48:701-716(1992). 

[ 2] Henrissat B. Bk5chem. J. 280:309-316(1991). 

[ 3] Watanabe T. Kohori K.. Miyashita K.. Fujii T. Sakai H., Uchlda M.. Tanaka H. J. Bfol. Chem. 268:18567-18572 
(1993). 

[4] Perrakis A., Tews I., Dauter Z., Oppenheim A.B., Chet I., Wilson K.S., Vorgias C.E. Structure 2:1169-1180 
45 (1994). 

[5] van Scheltinga A.C.T., Kalk K.H., Beintema J.J., Dijkstra B.W. Structure 2:1181-1189(1994). 
[1805] 769. gagjp)17. gag gene protein pi 7 (matrix protein). 

The matrix protein forms an icosahedral shell associated with the inner membrane of the mature Immunodeffciency 

so virus. Number of members: 1598 

[1806] [1] Medline: 95055757. Three-dimensional structure of the human immunodeficiency virus type 1 matrix pro- 
tein. Massiah MA, Starich MR, Paschall C, Summers MF, Christensen AM. Sundquist Wl; J Mol Bioll 994;244: 1 98-223. 
[1 807] 770. GDA1/CD39 family of nucleoskie phosphatases signature 
PROSITE cross-reference(s); GDA1_CD39_NTPASE 

ss A number of nucleoskJe diphosphate and triphosphate hydrolases as well as some yet uncharacterized proteins have 
been found to belong to the same family [1 , 2]. This family currently consist of: 

- Yeast guanosine-diphosphatase (EC 3.6.1.42) (GDPase) (gene GDA1). GDA1 is a golgl integral membrane en- 
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zyme that catalyzes the hydrolysis of GDP to GMR 

Potato apyrase (EC 3.6.1 .5) (adenosine diphosphatase) (ADPase). Apyrase acts on both ATP and ADP to produce 
AMR 

Mammalian vascular ATP-diphosphohydrolase (EC 3.6, 1 .5) (also known as lymphoid cell activation antigen CD39). 
5 . Toxoplasma gondii nucleoside-triphosphatases (EC 3.6.1.15) (NTPase). NTPase hydrolyses various nucleoside 
triphosphates to produce the corresponding nucleoside nrxxio- and diphosphates. This enzyme is secreted into 
the invaded host cell into the parasitophorous vacuole, a specialized compartment where the parasite intracellulary 

resides. 

Pea nucleoside-triphosphatases (EC 3.6. 1 . 1 5) (NTPase). 

10 

Caenorhabdftis elegans hypothetical protein C33H5. 1 4. 

- Caenorhabdrtis elegans hypothetical protein R07E4.4. 

- Yeast chromosome V hypothetical protein YEROOSw. 

IS [1808] The above uncharacterized proteins all seem to be membrane-bound. 

[1809] All these proteins share a number of consented domains. The best conserved of these domains have been 
selected. It is located in the central section of the proteins. 

[1810] Consensus pattem[LIVM]-x-G-x(2)-E-G-x4FY]-x4FW]4LIVA]-[TAGJ-x-N-[HY] 

20 [ 1] Handa M., Guidotti G. Biochem. Biophys. Res. Commun. 218:916-923(1996). 

[ 2] Vasconcebs E.G., Ferreira S.T., de Carvalho T.M.U., de Souza W., Kettlun A.M., Mancilta M., Valenzuela M. 
A., Verjovski-Almeida S. J. Biol. Chem. 271:22139-22145(1996). 

[1811] 771 . GTP cyclohydrolase I signatures 
2S PROSITE cross-reference(s); GTP_CYCLOHYDROL_1_1. GTP_CYCLOHYDROL_1_2 GTP cyctohydrolase I (EC 
3.5.4.16) catalyzes the biosynthesis of formk: acid and dihydroneopterin triphosphate from GTP. This reaction is the 
first step in the bk)synthesis of tetrahydrofolate in prokaryotes. of tetrahydrobkspterin in vertebrates, and of pteridine- 
containing pigments in insects. 

[1812] GTP cycbhydrotase I is a protein of from 190 to 250 amino acid residues. The comparison of the sequence 
30 of the enzyme from bacterial and eukaryotic sources shows that the structure of this enzyme has been extremely well 
consented throughout evolution [1]. 

[1 81 3] Two conserved regions were selected as signature patterns. The first contains a perfectly consen/ed tetrapep- 
tide which is part of the GTP-binding pocket [2], the second regbn also contains consented residues involved in GTP- 
binding. 

3S [1 81 4] Consensus pattem[DEN]-[LI VM](2)-x(2)-[KRNQHDENHLI VM]-x(3><ST]-x-C-E- H-H 
Consensus pattemlSA]-x-[RK]-x-Q-[UVM]-Q-E-[RN]-[LIJ-(TSN] 

[ 1] Maier J., Witter K., Guetlich M., Ziegler I., Werner T, Ninnemann H. Bkxshem. Bk)phys. Res. Commun. 212: 
705-711(1995). 

40 [ 2] Nar H., Huber R., Meining W., Schmid Weinkauf S.. Bacher A. Structure 3:459-466(1 995). 

[1815] 772. llvC. Acetohydroxy ^id isomeroreductase 

Acetohydroxy acid isomeroreductase catalyses the conversion of acetohydroxy ackte into dihydroxy valerates. This 
reactk)n is the second in the synthetic pathway of the essential branched skle chain amino ackJs valine and isoleucine. 

45 Number of members: 29 

[1816] [1] Medline: 97361822. The crystal structure of plant acetohydroxy acid isomeroreductase complexed with 
NADPH, two magnesium tons and a herbbkial transltk>n state analog determined at 1 .65 A resolution. Biou V Dumas 
R. Cohen-Addad C. Douce R, Job D, Pebay-Peyroula E; EMBO J 1997;16:3405-3415. 
[1817] 773. Prokaryotb membrane lipoprotein lipid attachment site 

50 PROSITE cross-reference(s); PROKAR__LIPOPROTEIN 

In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptbe, which is cleaved by a specific 
lipoprotein signal peptbase (signal peptidase II). The peptklase recognizes a consented sequence and cuts upstream 
of a cysteine residue to which a glyceride-fatty ackJ lipb is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1 ,2,3]): 

55 

Major outer membrane lipoprotein (murein-lipoproteins) (gene tpp). 
Escherichia coli lipoprotein-28 (gene nlpA). 
Escherichia coli lipoprotein-34 (gene nIpB). 
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Escherichia coli lipoprotein ntpC. 
- Escherichia coll lipoprotein nIpD. 

Escherichia coli osmotically inducible lipoprotein B (gene osmB). 

Escherichia coli osmotically inducible lipoprotein E (gene osmE). 
5 - Escherichia coli peptidoglycan-associated lipoprotein (gene pat). 

Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 

Escherichia coli copper homeostasis protein cutF (or nIpE). 

Escherichia coli plasmids tral proteins. 

Escherichia coil Col plasmids lysis proteins. 
10 - A number of Bacillus beta-lactamases. 

Bacillus subtllls periplasmic oligopeptlde-binding protein (gene oppA). 

Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 

Bof relia henmsii variable major protein 21 (gene vnrtp21 ) and 7 (gene vmp7). 

Chlamydia trachomatis outer membrane protein 3 (gene omp3). 
IS - Fibrobacter succinogenes endoglucanase cel-3. 

Haemophilus Influenzae proteins Pal and Pep. 

Klebsiella pullulunase (gene pulA). 

Klebsiella pullulunase secretion protein pulS. 

Mycoplasma hyorhinis protein p37. 
20 - Mycoplasma hyorhinis variant surface antigens A. B, and C (genes vtpABC). 

Neisseria outer membrane protein H.8. 

Pseudomonas aeruginosa lipopeptlde (gene IppL). 

Pseudomonas solanaceamm endoglucanase egl. 

Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 
2S - Rickettsia 17 Kd antigen. 

Shigella flexnerl Invasion plasmid proteins nnxiJ and mxIM. 

Streptococcus pneumoniae oligopeptide transport protein A (gene amIA). 

Treponema pallidium 34 Kd antigen. 

Treponema pallidium membrane protein A (gene tmpA). 
30 - Vibrio han/cyi chitobiase (gene chb). 

Yersinia virulence plasmid protein yscJ. 

Halocyanin from Natrot>acterium pharaonis [4], a membrane associated copper- binding protein. This is the first 
archaebacterial protein known to be modified in such a fashion). 

3S 

[1818] From the precursor sequences of all these prc^eins, we derived a consensus pattern and a set of rules to 
Identify this type of post-translatkxial nnodifk:ation. 

[1819] Consensus pattemlDERK}(6HUVMFWSTAG](2)-[U VMFYSTAGCQHAGS]-C [C is the lipkJ attachment site] 
Additbnal rules: 1) The cysteine must be between posltbns 15 and 35 of the sequence in consideration. 2) There must 
40 be at least one Lys or one Arg In the first seven positk)ns of the sequence. 

[ 1] HayashI S., WU H.C. J. Bloenerg. Blomembr. 22:451-471(1990). 
[ 2]Klein P. Somorjal R.L., Lau RC.K. Protein Eng. 2:15-20(1988). 
[ 3]von Heljne G. Protein Eng. 2:531-534(1989). 
45 [ 4]Mattar S., Scharf B., Kent S.B.H.. RodewaM K.. Oesterhelt D.. Engelhard M. J. Bk>l. Chem. 269:14939-14945 

(1994). 

[1820] 774. Aminoacyl-transfer RNA synthetases class-ll signatures 

PROSITEcross-reference(s); AA_TRNA_.L1GASEJI_1: AA_TRNA_LIGASEJL2 Aminoacyl-tRNA synthetases (EC 
SO 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as the 
first step in protein bbsynthesls. In prokaiyotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acki. In eukaryotes there are generally two amInoacyMRN A synthetases for 
each different amino acki: one cytosolk: form and a mitochondrial form. While all these enzymes have a common 
f unctbn. they are wklely diverse In terms of subunit size and of quaternary structure. 
ss [1821] The synthetases specifk; for alanine, asparagine, aspartk: acid, glycine, histidine, lysine, phenylalanine, pro- 
line, serine, and threonine are referred to as class-It synthetases [2 to 6] and probably have a common folding pattern 
in their catalytic domain for the binding of ATP and amino ackJ whch Is different to the Rossmann fold obsen^ed for 
the class I synthetases [7]. 
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[1822] Class-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved regions 

are present [2.5.8]. Signature patterns from two of these regions have been derived. 

[1823] Consensus pattem(FYH]-R-x-[DE]-x(4,1 2)-IRH]-x(3)-F-x(3HDE] 

Consensus pattem[GSTALVF]-{DENQHRKPHGSTAh[LIVMF]4DE]-R-ILIVI^F]-x4UVMS 

5 

[ IJSchimmel P. Annu. Rev. Biochem. 56:1 25-1 58(1 987). 
[ 2]Delame M.. Moras D. BioEssays 15:675-687(1993). 
[ 3]Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

[ 4]Nagel G.M., Doolittle R.R Proc. Natl. Acad, Sci. U.S.A. 88:8121-8125(1991). 
w [ sjcusack S., Haertiein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991). 

[ 6]Cusack 8. Biochimie 75:1077-1081(1993). 

[ 7]Cusack S.. Berthet-Cotominas C. Haertiein M., Nassar N.. Leberman R. Nature 347:249-255(1990). 
[ 8]Leveque R, Plateau P., Dessen P., Blanquet S. Nucleic AckJs Res. 18:305-312(1990). 

[1824] 775. X. Trans-activation protein X 

This protein is found In hepadnaviruses where it is indispensable for replication. Number of members: 91 
[1825] 776. Thymidylate synthase active site 

[1826] Thymidylate synthase (EC 2.1.1 .45) [1,2] catalyzes the reductive methylation of dUMP to dTMP with con- 
comitant conversion of 5,10-methylenetetrahydrofolate to dihydrofotate. Thymidylate synthase plays an essential role 
^ in DNA synthesis arKi is an important target for certain chenrtotherapeutic drugs. 

[1827] Thymidylate synthase Is an enzyme of about 30 to 35 Kd in most species except in protozoan and plants 
where it exists as a bifuncttonal enzyme that includes a dihydrofotate reductase domain. 

[1828] A cysteine residue is involved in the catalytic mechanism (it covalently binds the 5,6-dihydro-dUMP interme- 
diate). The sequence around the active site of this enzyme is conserved from phages to vertebrates. 
2S [1829] Consensus pattemR-x(2)-[UVM]-x(3)-[Rftn-[QN]-x(8.9)-[LV]-x-P-C-[HAVM]-x(3)-[QMTl-[FYWl-x-[LV] [C Is 
the active site residue] 

[ 1] Benkovic S.J. Annu. Rev Biochem. 49:227-251(1980). 

[ 2] Ross R, O'Gara R. Condon S. Appl. Environ. Microbiol. 56:2156-2163(1990). 

30 

[1830] 777. Glycosyl hydrolases family 31 signatures 

[1831] It has been shown [1,2.3.E1] that the following glycosyl hydrolases can be, on the basis of sequence similar- 
ities, classified into a single family: 

3S . Lysosonnal alpha-glucosidase (EC 3.2.1.20) (acid maltase) is a vertebrate glycosidase active at bw pH, whteh 
hydrolyzes alpha(1->4) andalpha(1->6) linkages in glycogen, maltose, and isonriattose. 

- Alpha-glucoskiase (EC 3.2. 1 .20) from the yeast Candkia tsukunbaensls. 

- Alpha-glucoskJase (EC 3.2. 1 .20) (gene malA) from the archebacteria Sulfolobus solfataricus. 

- Intestinal sucrase-isonnaltase (EC 3.2. 1 .48 / EC 3.2. 1 . 1 0) is a vertebrate membrane-bound, multifunctional enzyme 
40 complex whk:h hydrolyzes sucrose, maltose and isomaltose. The sucrase and isomattase domains of the enzyme 

are homotogous (41% of amino ackJ ktentity) and have most probably evolved by duplk^ation. 

- Glucoamytase 1 (EC 3.2. 1 .3) (glucan 1 ,4-alpha-glucoskjase) from vark>us fungal species. 
Yeast hypothetk:al protein YBR229c. 

Rissbn yeast hypothetical protein Sp AC30D1 1 .01 c. 

45 

[1832] An aspartic acid has been implicated [4] in the catalytk: activity of sucrase, isomaltase, and lysosomal alpha- 
glucosidase. The regk>n around this active residue is highly conserved and can be used as a signature pattern. A 
secoTKl regk>n, which contains two consented cysteines, has been used as an addltbnal signature pattern. 
[1833] Consensus pattern [GR]4UVMR]-W-x-D-M-[NSA]-E [D is the active sHe residue] 
so Consensus pattern G-(Ayl-[HLIV^m]-C-G-[PY]-x(3)-[ST^x(3)-L-C-x-R-W-x(2)-[Lyl-[GSA]-^ 

[ 1] Henrissat B. Bkxhem. J. 280:309-316(1991). 

[ 2] Kinselta B.T, Hogan S., Larkin A., Cantwell BA Eur. J. Biochem. 202:657-664(1991). 
[ 3] Naim H.Y. Niermann T. Klernhans U.. Hollenberg CP, Strasser A.W.M. FEBS Lett. 294:109-112(1991). 
ss [ 4] Hermans M.M.R, Kroos M.A.. van Beeumen J.. Oostra B.A.. Reuser A.J. J. J. Biol. Chem. 266:1 3507-1 3512 

(1991). 

[1 834] 778. Urease signatures 
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[1835] Urease (EC 3.5.1.5) is a nickel-brnding enzyme that catalyzes the hydrolysis of urea to carbon dioxide and 
ammonia [1]. Historically, it was the first enzyme to t>e crystallized (in 1926). It is mainly found in plant seeds, micro- 
organisnr^ and invertebrates. In plants, urease is a hexamer of identical chains. In bacteria [2], it consists of either two 
or three different subunits (alpha, beta and gamma). 
5 [1 836] Urease binds two nickel ions per subunit; four histidine, an aspartate and a carbamated-lysine serve as ligands 
to these metals; an additional histidine is involved in the catalytic mechanism [3]. 

[1837] As signatures for this enzyme, a regbn was selected that contains two htstkJine that bind one of the nickel 
ions and the regk)n of the active site histidine. 

[1838] Consensus pattern T-[AY]-lGAHGAT]-[LIVM]-D-x-H-[LIVM]-H-x(3>-P [The two H's bind nickel] 
10 Consensus pattern [LI VM](2)-[CT|-H-{HN]-L-x(3)-[LIVM]-x(2)-D-[LIVM]-x-F-A [H Is the active site residue] 

[ 1] Taklshlma K., Suga T, Mamiya G. Eur. J. Bk)chem. 175:151-165(1988). 

[ 2] Mobtey H.LT. Husinger P.P. Microbiol. Rev. 53:85-108(1989). 

( 3] Jabri E., Carr M.B., Hausinger R.P. Karplus PA. Science 268:998-1004(1995). 

IS 

[1839] 779. Tyrosine specifk; protein phosphatases signature and profiles 

[1840] Tyrosinespecific protein phosphatases (EC 3.1.3.48) (PTPase)[1 to 5] are enzymes that catalyze the renr^oval 
of a phosphate group attached to a tyrosine residue. These enzymes are very important in the control of cell growth, 
proliferation, differentiatbn and transformation. Multiple forms of PTPase have been characterized and can be classi- 
20 fled into two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s). The 
currently known PTPases are listed bek>w: 
[1841] Soluble PTPases. 

- PTPN1 (PTP-1B). 

2S - PTPN2 (T-ceW PTPase; TC-PTP). 

- PTPN3 (HI ) and PTPN4 (MEG), enzymes that contain an N-terminal band 4. 1 - like domain (see <PDCX^00566>) 
and could act at junctkms between the membrane and cytoskeleton. 

- PTPN5 (STEP). 

- PTPN6 (PTP-IC; HCP; SHP) and PTPN11 (PTP-2C; SH-PTP3; Syp), enzymes which contain two copies of the 
30 SH2 domain at its N-terminal extremity. The Drosoph ila protein corkscrew (gene csw) also belongs to this subgroup. 

- PTPN7 (LC-PTP; Hematopoietic protein-tyrosine phosphatase; HoPTP). 

- PTPN8 (70Z-PEP). 

- PTPN9 (MEG2). 

- PTPN1 2 (PTP-G1 ; PTP-P1 9). 
35 - Yeast PTP1. 

- Yeast PTP2 whk:h may be involved in the ubiquitin-mediated protein degradatkxi pathway. 
Fisskin yeast pypi and pyp2 whk:h play a role in inhibiting the onset of mitosis. 

- Fissksn yeast pyp3 which contributes to the dephosphorytation of cdc2. 
Yeast CDC14 v^ich may be involved in chromosome segregation. 

40 - Yersinia virulerw:e plasmid PTPAses (gene yopH). 

Autographa califomica nuclear polyhedrosis virus 1 9 Kd PTPase. 

[18^] Dual specificity PTPases. 

45 - DUSP1 (PTPN10; MAP kinase phosphatase-1; MKP-1); whk:h dephosphorylates MAP kinase on both Thr-183 
and Tyr-185. 

- DUSP2 (PAC-1). a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both TTir and Tyr 
reskiues. 

- DUSP3 (VHR). 
SO - DUSP4(HVH2). 

- DUSP5 (HVH3). 

- DUSP6(Pyst1;MKP-3). 

- DUSP7 (PysC; MKP-X). 

- Yeast MSG5, a PTPase that dephosphorylates MAP kinase FUS3. 
ss - Yeast YVH1. 

- Vaccinia virus HI PTPase; a dual specificity phosphatase. 

[1 843] Receptor PTPases. 
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[1844] Structurally, all known receptor PTPases, are made up of a variable length extracellular domain, followed by 
a transmembrane region and a C-terminal catalytic cytoplasmic domain. Some of the receptor PTPases contain fi- 
bronectin type III (FN-IH) repeats, immunogbbulin-IIke domains, MAM domains or caitxyiic anhydrase-like domains 
in their extracellular region, "me cytoplasmic regwn generally contains two copies of the PTPAse domain. The first 
s seems to have enzymatk: activity, while the second is inactive but seems to affect substrate specificity of the first. In 
these domains, the catalytk: cysteine is generally consented but some other, presumably important, residues are not. 
[1845] In the following table, the domain structure of known receptor PTPases is shown: 



Extracellular 


Intracellular 




Ig FN-3 CAH MAM PTPase 


Leukocyte common antigen (LCA) (CD45) 


0 


2 


0 


0 


2 


Leukocyte antigen related (LAR) 


3 


8 


0 


0 


2 


Drosophila OLAR 


3 


9 


0 


0 


2 


Drosophila DPTP 


2 


2 


0 


0 


2 


PTP-alpha (LRP) 


0 


0 


0 


0 


2 


PTP-beta 


0 


16 


0 


0 


1 


PTP-gamma 


0 


1 


1 


0 


2 


PTP-delta 


0 


>7 


0 


0 


2 


PTP-epsilon 


0 


0 


0 


0 


2 


PTP-kappa 


1 


4 


0 


1 


2 


PTP-mu 


1 


4 


0 


1 


2 


PTP-zeta 


0 


1 


1 


0 


2 



[1 846] PTPase domains consist of about 300 amino acids. There are two conserved cysteines, the second one has 
been shown to be absolutely required for activity. Furthermore, a number of consented reskJues in its immediate vicinity 
have also been shown to be important. 

[1847] A signature pattern was derived for PTPase domains centered on the active site cysteine. 
[1848] There are three profiles for PTPases, the first one spans the complete domain and is not specific to any 
subtype. The second profile is specific to dual-specificity PTPases and the third one to the PTP subfamily. 
[1849] Consensus pattern [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP]-x-[LI VMFY] [C is the active site residue] 
[1850] Notethe M-phase inducer phosphatases (cdc25-type phosphatase) are tyrosine-protein phosphatases that 
are not structurally related to the above PTPases. 

[1851] Notethis documentation entry is linked to both a signature pattern and to profiles. As profiles are much more 
sensitive than the pattern, you shouM use them if you have access to the necessary software tools to do so. 



[ 1] Fischer E.H.. Charbonneau H., Tonks N.K. Science 253:401-406(1991). 
[ 2] Charbonneau H., Tonks N.K. Annu. Rev. Cell Bk>L 8:463493(1992). 
[ 3] TrowbrkJge I S. J. Biol. Chem. 266:23517-23520(1991). 
[ 4] Tonks N.K., Charbonneau H, Trends Biochem. Sci. 14:497-500(1989). 
[ 5] Hunter T Cell 58:1013-1016(1989). 



[1852] 780. Connexins signatures 

[1853] Gap junctions [1] are specialized regions of the plasma membrane which consist of closely packed pairs of 
transmembrane channels, the connexons, through which small molecules diffuse from a cell to a neighboring cefl. Each 
connexon is composed of an hexamer of an integral membrane protein which is often referred to as connexin. In a 
given species there are a number of different, yet structurally related, tissue specific, forms of connexins. The types 
of connexins which are currently known are listed below. 

- Connexin 56 (Cx56). 

- Connexin 50 (Cx50) (lens fiber protein MP70). 

- Connexin 46 (Cx46) (alpha-3). 

- Connexin 45 (Cx45) (alpha-6). 

- Connexin 43 (Cx43) (alpha-1 ). 

- Connexin 40 (Cx40) (alpha-5), 

- Connexin 38 (Cx38) (alpha-2). 
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20 



Connexin 37 (Cx37) (alpha-4). 
Connexin 33 (Cx33) (alpha-7). 
Connexin 32 (Cx32) (beta-1). 
Connexin 31.1 (Cx31,1) (beta-4). 
Connexin 31 (Cx31) (beta-3). 
Connexin 30.3 (Cx30.3) (beta-5). 
Connexin 26 (Cx26) (beta-2). 

[1854] Structurally the connexins consist of a short cytoplasmic N4erminal domain, followed by four transmembrane 
to segments that delimit two extracellular and one cytoplasmic loops; the C4enminal domain is cytoplasmic and its length 
is variable (from 20 residues in Cx26 to 260 residues in Cx56). The schematic representation of this structure is shown 
below. 



IS NH2-**'^ ********** ***-COOH 



Cytoplasmic 



2s ** •* ♦* Membrane 

** *« «• 

•» ** ** 

30 ♦* •* Extracellular 

3S 

[1855] The sequences of the two extracellular loops are well conserved. In both loops there are three conserved 
cysteines which are involved in disulfide bonds. A signature patterns from each of these two loop regions has been built. 
[1 856] Consensus pattemC-[DNl-T-x-Q-P-G-C-x(2)-V-C-[F Y]-D [The three C's are involved in disulfide bonds] Con- 
sensus patternC•x(3,4)-P-C-x(3HLIVM]-[DEN]-C-[FY]^LIVMHSA]-{KR]-P [The three C's are involved in disulfide 
40 bonds] 

[1857] [ 1] Goodenough D.A., Goliger J.A., Paul D.L Annu. Rev. Biochem. 65:475-502(1 996). 
[1858] 781 . Gram-positive cocci surface proteins 'artchoring' hexapeptide 

[1859] Surface proteins from Gram-positive cocci contains a conserved hexapeptide located a few residues down- 
stream of a hydrophobic C-terminal nnembrane anchor region which is followed by a cluster of basic amino acids [1]. 
45 This structure is reppesented in the folbwing schennatic representation: 



•I +-+ +-+ 

so 

I Variable length extracellular domain |H| Anchor |B| 
^ +.+ +.+ 

^ Hf : conserved hexapeptide. 

*B*: cluster of basic residues. 
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[1 860] It has been proposed that this hexapeptide sequence is responsible for a post-translational modification nec- 
essary for the proper anchoring of the proteins which bear it, to the cell wall. 
Proteins known to contain such hexapeptide are listed below: 

5 - Aggregation substance from streptococcus faecalis (asal ). 

C5a peptidase from Streptococcus pyogenes (scpA). 

C protein alpha-antigen from Streptococcus agalactrae (bca). 

Cell surface antigen I/I I (PAC) from Streptococcus mutans. 

Dextranase from Streptococcus downei (dex). 
10 - Fibronectin-binding protein from Staphylococcus aureus (fnbA). 

Fimbrial subunits from Actinomyces naeslundii and viscosus. 

IgA binding protein from Streptococcus pyogenes (arp4). 

IgA binding protein (B antigen) from Streptococcus agatactiae (bag). 

IgG binding proteins from Streptococci and Staphylococcus aureus. 
IS - Intemalln A from Listeria monocytogenes (InIA). 

M proteins from streptococci. 

- Muramidase-released protein from Streptococcus suls (mrp). 

NIsin leader peptide processing protease from Lactococcus lactis (nisP). 
Protein A from Staphylococcus aureus. 
20 - Trypsln-resistant surface T protein from streptococci. 

Wall-associated protein from Streptococcus mutans (wapA). 

- Wiall-associated serine proteinases from Lactococcus lactis. 

[1 861] Consensus pattemL-P-x-T-G-{STG AVDE] 
2S [1862] [ 1] Schneewind O.. Jones K.R, Fischetti V.A. J. Bacteriol. 172:3310-3317(1990). 
[1863] 782. Gamma-glutamyltranspeptidase signature 

[1864] Gamma-glutamyltranspeptidase (EC 2.3.2.2) (GGT) [1 ] catalyzes the transfer of the gamma-glutamyl moiety 
of glutathione to an acceptor that may be an amino acid, a peptide or water (forming glutamate). GGT plays a key role 
in the gamma-glutamyl cycle, a pathway for the synthesis and degradatk>n of glutathione. In prokaryotes and eukary- 
30 otes, it is an enzyme that consists of two polypeptide chains, a heavy and a light subunit, processed from a single 
chain precursor. The active site of GGT is known to be located in the light subunit. 

[1865] The sequences of mammalian and bacterial GGT show a number of regk>ns of high similarity [2]. Pseu- 
domonas cephabsporin acylases (EC 3.5.1.-) that convert 7-beta-(4-carboxybutanamldo)-cephalosporanic ac\6 (GL- 
7ACA) Into 7-aminocephak>sporanic acid (7ACA) and gtutaric acid are evolutionary related to GGT and also show 
3S some GGT activity [3]. Like GGT, these GL-7ACA acylases, are also composed of two subunits. 

[1866] One of the conserved regions correspond to the N-terminal extremity df the mature tight chains of these 
enzymes. This region has been used as a signature pattern. 

[1867] Consensus pattemT-ISTAhH-x-[ST].[LIVMA]-x(4)-G-[SN]-x-V-[STA]-x-T-x-T-[UVM]-[NE]-x(1 .2)-[FY]-G 

40 1 1] Tate S.S., Meister A. Meth. Enzymol. 113:400-419(1985). 

[ 2] Suzuki H., Kumagai H.. Echlgo T. Tochikura T J. Bacteriol. 171:5169-5172(1989). 
[ 3] Ishiye M.. Niwa M. Bkxhim. Bk)phys. Acta 1132:233-239(1992). 

[1868] 783. Ferrochelatase signature 
4S [1869] Ferrochelatase (EC 4.99.1 .1 ) (protoheme ferro-lyase) [1,2] catalyzes the last step in heme bbsynthesis: the 
chelatbn of a ferrous ion to proto-porphyrin IX, to form protoheme. 

[1870] In eukaryotes, ferrochelatase is a mitochondrial protein bound to the inner membrane, whose active site faces 
the mitochondrial matrix. The mature form of eukaryotic ferrochelatase is composed of about 360 amino acids. In 
bacteria, ferrochelatase (gene hemH) [3] is a protein of from 310 to 380 amino ackis. 
so [1871] The human autosomal dominant disease protoporphyria is due to the reduced activity of ferrochelatase. 
[1 872] The signature pattern for this enzyme is based on a consen/ed regkxi which contains a histidine reskJue which 
could be involved in binding iron. 

[1873] Consensus pattem[UVMFl(2)-x-[ST]-x-H-IGS]-ILlVM]-P-x(4.5)-[DENQKR]-x-G-[DP]-x(1 .2)-Y 

SS [1] Labbe-Bois R. J. Bol. Chem. 265:7278-7283(1990). 

[ 2] Brenner D.A.. Frasier F. Proc. Matt. Acad. Sci. U.S.A. 88:849-853(1991). 

[ 3] Miyamoto K.. Nakahigashi K.. NIshimura K., Inokuchi H. J. Mol. Biol. 219:393-398(1991). 
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[1874] 784. Cellulose-binding domain, bacterial type 

[1875] The microbial degradation of cellulose and xylans requires several types of enzyme such as endoglucanases 
(EC 3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), orxylanases (EC 3.2.1.8) [1]. 
[1876] Structurally, ceilulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding 
5 domain (CBD) by a short linker sequence rich in proline and^r hydroxy-amino acids. 

[1877] The CBD of a number of bacterial ceilulases has been shown to consist of about 105 amino acid residues 
[2]. Enzymes known to contain such a domain are: 

Endoglucanase (gene endl ) from Butyrivibrb fibrisolvens. 
10 - Endoglucanases A (gene cenA) and B (cenB) from Cellukxnonas fimi. 
Exoglucanases A (gene cbhA) and B (cbhB) from Cellukjmonas fimL 

- Endoglucanase E-2 (gene celB) from Thermomonospora f usca. 
Endoglucanase A (gene celA) from Microbispora bispora. 

Endoglucanases A (gene celA). B (celB) and C (ceIC) from Pseudomonas fluorescens. 
IS - Endoglucanase A (gene celA) from Streptomyces livklans. 
Exocelbbbhydrolase (gene cex) from Cellubmonas fimi. 
Xylanases A (gene xynA) and B (xynB) from Pseudomonas fluorescens. 
Arabinofuranosidase C (EC 3.2.1 .55) (xylanase C) (gene xynC) from Pseudonrxxias fluorescens. 

- Chitlnase 63 (EC 3.2. 1.14) from Streptomyces plicatus. 
20 - Chitinase C from Streptomyces livkians. 

[1878] The CBD domain is found either at the N-terminal or at the C-terminal extremity of these enzymes. As it Is 
shown in the folk>wing schematic representation, there are two conserved cysteines in this CBD domain - one at each 
extremity of the domain - which have been shown [3] to be involved in a disulfide bond. There are also four consented 
25 tryptophan reskiues whk:h coukJ be involved in the interactkxi of the CBD with polysaccharkJes. 



i I 

xCxxxxWxxxxxNxxxWxxxxxxxWxxxxxxxxWNxxxxxGxxxxxxxxxxCx 
'C*: conserved cysteine involved in a disulfide bond. **': position of the pattern. 



Consensus pattemW-N-[STAGR]-[STDN]-[LIVM]-x(2)-[GST]-x-[GSTl-x(2)- [LIVMFT]-[GA] 

40 

[1] Gilkes N.R.. Henrissat B., Kilbum D.G., Miller R.C. Jr.. Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 2] Meinke A., Gilkes N.R., Kilbum D.G., Miller R.C. Jr.. Warren R.A.J. Protein Seq. Data Anal. 4:349-353(1991). 
[ 3] Gilkes N.R. Claeyssens M.. Aebersold R.. Henrissat B.. Meinke A.. Morrison H.D., Kilbum D.G.. Warren R.A. 
J.. Miller R.C. Jr. Eur. J. Bkxhem. 202:367-377(1991). 

45 

[1879] 785. Amkiases signature 

[1880] It has been shown [1 ,2,3] that several enzymes from various prokaryotic and eukaryotk; organisms which are 
involved in the hydrolysis of amkles (amidases) are evolutionary related. These enzymes are listed below. 

so - Indoleacetamide hydrolase (EC 3.5.1 .-), a bacterial plasmki-encoded enzyme that catalyzes the hydrolysis of in- 
dole-3-acetamkie (I AM) into indoie-3-acetate (lAA), the second step in the bk)synthesis of auxins from tryptophan. 

- Acetamkiase from Emerbella nkiulans (gene amdS). an enzyme whk:h allows acetamkle to be used ds a sole 
carbon or nitrogen source. 

- AmkJase (EC 3.5.1.4) from Rhodococcus sp. N-774 and Brevibacterium sp. R312 (gene amdA). This enzyme 
ss hydrolyzes propbnamkies efficiently, and also at a bwer efficiency, acetamide. acrylamide and indoleacetamide. 

Amkiase (EC 3.5.1 .4) from Pseudomonas chlororaphis. 

- 6-aminohexanoate-cyclk:-dimer hydrolase (EC 3.5.2.12) (nylon oligomers degrading enzyme E1) (gene nylA). a 
bacterial plasmki encoded enzyme whk:h catalyzes the first step in the degradation of 6-amlnohexanolc ackl cyclic 
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dimer, a by-product of nyton manufacture [4]. 

- Glutamyl-tRNA(Gln) amidotransferase subunit A [5]. 
Mammalian fatty acid amide hydrolase (gene FAAH) [6]. 

- A putative amidase from yeast (gene AMD2). 

s - Mycobacterium tuberculosis putative amidases amiA2, amiB2, amtC and amiD. 

[1 881] All these enzymes contann in their central section a highly conserved region rich in glycine, serine, and alanine 
residues. This region has been used as a signature pattern. 

Consensus pattern: G4GAJ-S-IGS]-[GS]-G-x4GSA]-[GSAVY>x-[UVM]4GSAl-x(6)-[GSAT|-x4GA]-x-[DE]-x-[GA]-x-S- 
10 [LIVMl-R-x-P-{GSAC] 

[ 1] Mayaux J.-R. Cerbelaud E., Soubrler R. Faucher D., Petre D. J. Bacterlol. 172:6764-6773(1990). 
[ 2] Hashimoto Y, Nishiyama M.. IkehataO., Horinouchi S., Beppu T. Biochim. Biophys. Acta 1088:225-233(1991). 
( 3] Chang T.-H.. Abelson J. Nucleic Acids Res. 18:7180-7180(1 990). 
IS 1 4]Tsuchiya K., Fukuyama S., Kanzaki N., Kanagawa K., NegoroS., Okada H. J. Bacteriol. 171:3187-3191(1989). 

[ 5] Curnow A.W., Hong K.W.. Yuan R.. Kim S.I.. Martins O.. Winkler W.. Henkin T.M., Soli D. Proc. Natl. Acad. 
Sci. U.S.A. 94:11819-11826(1997). 

[ 6] Cravatt B.R, Giang O.K.. Mayfleki S.P, Boger D.L, Lemer R.A.. Gilula N.B. Nature 384:83-87(1996). 

20 [1882] 786. Glycosyl hydrolases family 10 active site 

[1 883] The microbial degradatksn of celluk>se and xylans requires several types of enzymes such as endoglucanases 
(EC 3.2.1.4), celk>bk>hydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 3.2.1.8) [1,2]. Fungi and bacteria 
produces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, 
can be classified into families. One of these families is known as the cellulase family F [3] or as the glycosyl hydrolases 

2S family 10 [4,E1]. The enzymes whk:h are currently known to bekmg to this family are listed below. 

Aspergillus awamori xylanase A (xynA). 
Bacillus sp. strain 1 25 xylanase (xynA). 
Bacillus stearothermophilus xylanase. 
30 - Butyrivibrio fibrisolvens xylanases A (xynA) and B (xynB). 

- Caldoceltum saccharolyticum bifunctkxial endoglucanase/exoglucanase (celB). This protein consists of two do- 
mains; it Is the N-tenminal domain, whk^h has exoglucanase activity, which bebngs to this family. 
Catdocellum saccharolyticum xylanase A (xynA). 

- Caldocellum saccharolyticum ORF4. This hypothetKal protein is encoded in the xynABC operon and is probably 
3S a xylanase. 

- Cellulomonas fimi exoglucanase/^lanase (cex). 

- Ck)stridium stercorarium thermostable celk>xytanase. 

- Cksstridium thermocellum xylanases Y (xynY) and Z (xynZ). 
Cryptococcus albidus xylanase. 

40 - Penicillium chrysogenum xylanase (gene xylP). 

Pseudomonas fluorescens xylanases A (xynA) and B (xynB). 

- Ruminococcus flavefaclens brfunctk>nal xylanase XYLA (xynA). TTiis protein consists of three domains: a N-ter- 
minal xylanase catalytic domain that belongs to family 11 of glycosyl hydrolases; a central domain composed of 
short repeats of Gin, Asn an Trp. and a C-terminal xylanase catalytk: domain that bebngs to family 10 of glycosyl 

4S hydrolases. 

Streptomyces lividans xylanase A (xtnA). 
Thermoanaerobacter saccharotytbum endoxylanase A (xynA). 
Thenmoascus aurantiacus xylanase. 

- Thermophilk: bacterium Rt8.B4 xylanase (xynA). 

so 

[1884] One of the conserved regkxis in these enzymes is centered on a conserved glutamic acki residue which has 
been shown [5], In the exoglucanase from Cellukxnonas fimi, to be directly involved in glyooskJic bond cleavage by 
acting as a nucteophile. This regk>n has been used as a signature pattern. 

[18851 Consensus pattem[GTA]-x(2)-[LIVNhx-(IVMF]-[SThE-[LIYJ-[DNHLI VMF] [E is the active site reskJue] 

56 

[ 1] Beguin P Annu. Rev. Mk:robk>l. 44:219-248(1990). 

[ 2] Gilkes N.R.. Henrissat B., Kilbum D.G., Miller R.C. Jr., Warren R.A.J. Mk;robiol. Rev. 55:303-315(1991). 
[ 3J Henrissat B., Claeyssens M„ Tomme P, Lemesle L. Momon J.-P Gene 81:83-95(1989). 
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[ 4] Henrissat B. Biochem. J. 280:309-316(1991). 

( 5\ lull a. Withers S.G.. Gilkes N.R. Kitbum D.G.. Warren RAJ.. Aebersold R. J. Bbl. Chem. 266:15621 -15625 
(1991). 

5 [1886] 787. Fructose-bisphosphate alctolase class-M signatures 

[1887] Fructose-bisphosphate aldolase (EC 4.1.2.13) [1.2] is a glycolytic enzyme that catalyzes the reversible aldol 
cleavage or condensation of fructose-1 ,6- bisphosphate intodrhydroxyacetone-phosphate and glyceraldehyde 3-phos* 
phate. There are two classes of fructose-bisphosphate aldolases with different catalytic mechanisms. Class-I I aldolases 
[2], mainly found in prokaryotes and fungi, are homodimerk: enzymes which require a divalent metal bn - generally 

10 zinc - for their activity. 

[1 888] This family also includes the following proteins: 

Escherichia coli galactltol operon protein gatY whk:h catalyzes the transfomnation of tagatose 1 .6-bisphosphate 
Into glycerone phosphate and D- glycerakiehyde 3-phosphate. 
IS - Escherk:hia coli N-ac^ galactosamine operon protein agaY which catalyzes the same reactbn as that of gatY 

[1889] As signature patterrts for this class of enzyme, two conserved regions were selected. The first pattern is 
kx:ated in the first half of the sequence and contains two histidine reskJues that have been shown [4] to be involved in 
binding a zinc bn. The second is located in the C-terminal sectkxi and contains clustered acidk: residues and glycines. 
20 [1890] Consensus pattemIFYVMT]-x(1.3)4LIVMH]-[APNHLIVMJ-x(1,2)-[LIVMJ-H-x-D-H-IGACH] [The two H's are 
zinc ligands] 

Consensus pattem[UVM]-E-x-E-[LIVM]-G-x(2)-[GMHGSTAhx-E 

[ 1] Perham R.N. B»chem. Soc. Trans. 18:185-187(1990). 
2S [ 2] Marsh J.J.. Lebherz H.G. Trends Bkxhem. Sci. 17:110-113(1 992). 

[ 3] von der Osten C.H.. Barbae C.F. 111. Wong C.-H.. Sinskey A.J. Mol. I^icrobbl. 3:1625-1637(1989). 
[ 4] Berry A.. Marshall K.E. FEBS Lett. 318:11-16(1993). 

[1891] 788. Prolyl oligopeptldase family serine active site 
30 [1892] The prolyl oligopeptldase family [1 ,2,3] consist of a number ot evolutkxiary related peptidases whose catalytic 
activity seems to be provided by a charge relay system similar to that of the trypsin family of serine proteases, but 
whk:h evolved by independent convergent evolutk)n. The known members of this family are listed bebw. 

Prolyl endopeptidase (EC 3.4.21 .26) (PE) (also called post-proline cleaving enzyme). PE is an enzyme that cleaves 
35 peptide borKis on the C-terminal skJe of prolyl residues. The sequence of PE has been obtained from a mamnralian 

species (pig) and from bacteria (Ftavobacterium meningoseptbum and Aeromonas hydrophila); there is a high 
degree of sequertce conservatbn between these sequences. 

- Escherichia coli protease II (EC 3.4.21 .83) (oligopeptklase B) (gene prtB) whbh cleaves peptbe bonds on the C- 
termtnal skle of lysyl and argininyl resbues. 

40 - Dipeptidyl peptidase IV (EC 3.4. 1 4.5) (DPP IV). DPP IV is an enzyme that removes N-lerminal dipeptides sequen- 
tially from potypeptkles having unsubstituted N-termint provided that the penultimate residue is proline. 

- Yeast vacuolar dtpeptkiyi aminopeptidase A (DPAP A) (gene: STE1 3) whbh is responsible for the proteolytb mat- 
uratbn of the alpha-factor precursor. 

- Yeast vacuolar dtpeptbyl aminopeptidase B (DPAP B) (gene: DAP2). 

45 - Acylamino-acb-releasing enzyme (EC 3.4.19.1) (acyl-peptide hydrolase). This enzyme catalyzes the hydrolysis 
of the amlno-terminai peptbe bond of an N-acety lated protein to generate a N-acetylated amino acid and a protein 
with a free amino-terminus. 

[1893] A conserved serine resbue has experimentally been shown (in E.coli protease II as well as in pig and bacterial 
so PE) to be necessary for the catalytic mechanism. This serine, whbh is part of the catalytic triad (Ser, His, Asp), is 
generally bcated about 150 resbues away from the C-terminal extremity of these enzymes (which are all proteins that 
contains about 700 to 800 amino acbs). 

[1894] Consensus pattemD-x(3)-A-x(3)-[LIVMFYW|-x(14)-G-x-S-x-G-G-[LIVMFYW](2) [S is the active site resbue] 
[1 895] Mote these proteins bebng to families S9A/S9B/S9C in the classification of peptbases [4.E 1 ]. 

55 

[ 1] Rawtings N.D., Polgar L. Barrett A. J. Biochem. J. 279:907-911(1991). 
[ 2] Barrett A. J.. Rawlings N.D. BbL Chem. Hoppe-Seyler 373:353-360(1992). 
[ 3] Polgar L, Szabo E. Bbl. Chem. Hoppe-Seyler 373:361-366(1992). 
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[ 4] Rawlings N.D.. Barrett A. J. Meth. Enzymol. 244:19^1(1994). 
[1896] 789. Format e~tetrahydr(^olate ligase signatures 

[1897] Formate-tetrahydrotolate ligase (EC 6.3.4.3) (formyltetrahydrofolate synthetase) (FTHFS) Is one of the en- 
5 2ymes participating in the transfer of one-cartx)n units, an essential element of various biosynthetic pathways. In many 
of these processes the transfers of one-cartxxi units are mediated by the coenzyme tetrahydrofolate (THF). Various 
reactions generate one-carbon derivatives of THF which can be interconverted between different oxidation states by 
FTHFS, methylenetetrahydrofolate dehydrogenase (EC 1.5.1.5) and methenyltetrahydrofolate cycbhydrolase (EC 
35.4.9). 

10 [1896] In eukaryotes the FTHFS activity is expressed by a multifunctional enzyme, C-1 -tetrahydrofolate synthase 
(CI -THF synthase), which also catalyzes the dehydrogenase and cyclohydrolase activities. Two forms of CI -THF 
synthases are known [1 ], one is located in the mitochondrial matrix, while the second one Is cytoplasmic. In both forms 
the FTHFS domain consist of at>out 600 amino acid residues and is located In the C-terminal section of CI -THF syn- 
thase. In prokaryotes FTHFS activity is expressed by a monofunctional homotetrameric enzyme of about 560 amino 

IS acki residues [2]. 

[1899] The sequence of FTHFS is highly conserved In all forms of the enzyme. As signature pattems, two regions 
that are almost perfectly consen/ed were selected. The first one is a gtyclne-rich segment kx^ated in the N^erminal 
part of FTHFS and whk:h couM be part of an ATP-binding domain [2]. TTie second pattem is located in the central 
sectbn of FTHFS. 
20 [1900] Consensus pattemG-[LIVM]-K-G-G-A-A-G-G-G-Y 
Consensus pattemV-A-T-IIV]-R-A-L-K-x-(HN]-G-G 

[ 1] Shannon K.W., Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988). 

[ 2] Lovell C.R., Przybyla A. Ljungdahl LG. Biochemistry 29:5687-5694(1990). 

2S 

[1901] 790. Transthyretin signatures 

[1902] Transthyretin (prealbumin) [1] is a thyroki hormone-binding protein that seems to transport thyroxine (T4) 
from the bloodstream to the brain. It is a protein of about 130 amino acids that assembles as a homotetramer and 
forms an internal channel that binds thyroxine. Transthyretin Is mainly synthesized In the brain choroid plexus. In 
30 humans, variants of the protein are associated with distinct forms of amyloidosis. 

[1903] The sequence of transthyretin is highly conserved In vertebrates. A number of uncharacterized proteins also 
betong to this family: 

Escherichia coll hypothetical protein yedX. 
3S - Bacillus subtllis hypothetical protein yunM. 

- Caenorhabditis elegans hypothetk:al protein R09H10.3. 

- Caenorhabditis elegans hypothetk:al protein ZK697.8. 

[1904] Two regions were selected as signature patterns. The first kx^ted in the N-terminal extremity starts with a 
40 lysine known to be Involved In binding T4. The second pattem Is kx:ated In the C-lerminal extremity. 

[1905] Consensus pattem[KH]-[iyi-L-IDN]-x(3)-G-x-P-A-x(2)-[iyi-x-[IVl [The K binds thyroxine] 

Consensus pattem Y-[THHiyHAP]-x(2)-L-S-[PQ]-[FYW|-[GS]-IFY]-[QS] 

[1906] [ 1] Schreiber G., Rk:hardson S.J. Comp. Bkx:hem. Physiol. 116B: 137-1 60(1 997). 

[1907] 791 . DIhydropteroate synthase signatures 
45 [1908] All organisms require reduced folate cof actors for the synthesis of a variety of metabolites. Most mrcroorgan- 

isms must synthesize folate de novo because they lack the active transport system of higher vertebrate cells which 

alkyws these organisms to use dietary folates. Enzymes that are involved in the bbsynthes'is of folates are therefore 

the target of a variety of antimicrobial agents such as trimethoprim or sulfonamides. 

[1909] DIhydropteroate synthase (EC 2.5.1.15) (DHPS) catalyzes the condensation of 6-hydroxymethyl-7,8-dihy- 
so dropterldine pyrophosphate to para-aminobenzoic acki to form 7,8-dihydropteroate. This is the second step in the three 
steps pathway leading from 6-hydroxymethyl-7,8-dihydropterin to 7,8-dihydrof olate. DHPS is the target of sulfonamides 
whk:h are substrates anak>g that compete with para-aminobenzoic ackJ. 

[1910] Bacterial DHPS (gene sul or folP) [1] is a protein of about 275 to 315 amino ackf reskJues which Is either 
chromosomally encoded or found on varbus antibiotb resistance plasmkis. In the lower eukaryote Pneumocystis car- 
55 inil, DHPS is the C-terminal domain of a multifunctional folate synthesis enzyme (gene fas) [2]. 

[1911] Two signature patterns for DHPS were developed, the first signature is kx^ted in the N4erminal sectkxi of 
these enzymes, while the second signature Is kx:ated in the central section. 
[1912] Consensus pattem[LIVM]-x-[AG]-{LIVMF](2)-N-x-T-x-D-S-F-x-D-x-[SG] 
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Consensus pattemIGEHSA]-x-[LIVM](2)-D-|UVM]-G-[GP]-x(2)-[STA]-x-P 

[ 1] Slock J.. Stahly D R. Han C.-Y, Six E.W, Crawford IP. J. Bacterid. 172:7211-7226(1990). 

[ 2] Volpes R. Dyer M., Scaife J.G., Darby G., Stammers D.K., Delves CJ. Gene 112:213-218(1992). 

5 

[1 91 3] 792. Phosphatidylinosilol 3- and 4-kinases signatures 

[1914] Phosphatidylirrositol 3-kinase (PI3-kinase) (EC 2.7.1.137) [1] is an enzyme that phosphorylates phosphoi- 
nosrtides on the 3-hydroxyl group of the inositol ring. The exact function of the three products of PI3-kinase - PI-3-R 
PI-3,4-P(2) and PI-3,4,5-P(3) - is not yet known, although it is proposed that they functbn as second messengers in 
10 cell signalling. Currently, three forms of PI3-klnase are known: 

- The mammalian enzyme which is a heterodlmer of a 110 Kd catalytic chain (p110) and an 85 Kd subunit (p85) 
which albws it to bind to activated tyrosine protein kinases. There are at least two different types of pi 00 subunits 
(alpha and beta). 

IS - Yeast TOR1/DRR1 and TOR2/DRR2 [2], PI3-kinases required for cell cycle activatkxi. Both are proteins of about 
280 Kd. 

- Yeast VPS34 [3], a PI3'kinase involved in vacuolar sorting and segregatk)n. VPS34 is a protein of about 100 Kd. 

- ArabidopsIs thaliana and soybean VPS34 homologs. 

20 [1915] Phosphatldylinositol 4-kinase (PI4-kinase) (EC 2.7.1.67) [4] is an enzyme that acts on phosphatidylinositol 
(PI) in the first committed step in the productkxi of the second messenger inositol-1 ,4,5,-trispho6phate. Currently the 
foltowing fornrts of PI4-kinases are known: 

Human PI4-kinase alpha. 
25 - Yeast PIK1 , a nuclear protein of 1 20 Kd. 

- Yeast STT4. a protein of 21 4 Kd. 

[1916] The PIS- and PI4-kinases share a well conserved domain at their C-terminal section; this domain seems to 
be distantly related to the catalytic donnain of protein kinases [2]. Two signature patterns were devetoped from the best 
30 consented parts of this domain. 

[1 91 7] Four additional proteins bebng to this family: 

- Mammalian FKBP-rapamycin associated protein (FRAP) [5], which acts as the target for the cell-cycle arrest and 
immunosuppressive effects of the FKBP12-rapamycin complex. 

35 . Yeast protein ESR1 [6] which is required for cell growth, DNA repair and meiotic recombination. 
Yeast protein TEL1 whk:h is Involved in controlling telomere length. 

- Yeast hypothetcal protein YHR099w, a distantly related member of this family. 

- Fission yeast hypothetk^al protein SpAC22E12.16C. 

40 [1918] Consensus pattem[LIVMFAC]-K-x(1 ,3)-IDEAHDEHLIVMC]-R-Q-[DE]-x(4)-Q 
Consensus pattemlGS]-x-[AV]-x(3)-[LIVMJ-x{2)-[FYH14LIVM](2>-x-[LIVMF]-x-D-R'H-x(2h 

[ 1] Miles I.D., Otsu M.. Vblinia S., Fry M.J.. Gout L, Dhand R.. Panayotou G.. Ruiz-l^rrea F.. Thompson A.. Totty 
N.R. Hsuan J. J., Courtneidge S.A.. Parker PJ., Waterfield M.D. Cell 70:419-429(1992). 
46 [ 2] Kunz J., Henriquez R., Schnekier U.. Deuter-Reinhard M.. Mowa N., Hall M.N. Cell 73:585-596(1993). 

[ 3] Schu PV.. Takegawa K.. Fry M.J., Stack J.H., Vfeterfield M.D.. Emr S.D. Science 260:88-91 (1 993). 
[ 4] Garcia-Bustos J.R, Marini F.. Stevenson I., Fret C, Hall M.N. EMBO J. 13:2352-2361(1994). 
[ 5] Brown E.J., Albers M.W., Shin TB.. Ichtkawa K.. Keith C.T, Lane W.S., Schreiber S.L Nature 369:756-758 
(1994). 

so 1 6] Kato R.. Ogawa H. Nuciek: Acids Res. 22:3104-3112(1994). 

[1919] 793. FAD-dependent glycerol-3-phosphate dehydrogenase signatures 

[1 920] FAD-dependent glycerot-3-phosphate dehydrogenase (EC 1 . 1 .99.5) (GPD) catalyzes the conversion of glyc- 
erol-3-phosphate intodihydroxyacetone phosphate. In bacteria [1 ] it is associated with the utilization of glycerol coupled 
55 to respiration. In Escherichia coll, two isozymes are known: one expressed under anaerobic conditions (gene glpA) 
and one in aerobe conditions (gene gtpD). In eukaryotes, a mitochondrial form of GPD partk:ipates in the glycerol 
phosphate shuttle in conjunctksn with an NAD-dependent cytoplasmic GPD (EC 1 .1 .1 .8) [2,3]. 
[1921] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-binding domain in their N- 
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terminal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand calcium- 
binding region (See <PDOC0001B>) In its C-terminal extremity. 

[1 922] Two signature patterns were developed. One based on the first half of the FAD-binding domain and one which 
corresponds to a consen/ed region in the central part of these enzymes. 
s [1923] Consensus pattem[IV]-G-G-G-x(2)-G-{STACVl-G-x-A-x-D-x(3)-R-G 
Consensus pattemG-G-K-x(2HGSTE]-Y-R-x(2)-A 

( 1] Austin D., Larson TJ. J. Bacteriol. 173:101-107(1991). 
[ 2] Roennow B., Kielland-Brandt M.C. Yeast 9:1121-1130(1993). 
10 1 3] Brown LJ., McDonald M.J.. Lehn DA. Moran S.M. J. Biol. Chem. 269:14363-14366(1994). 

[1 924] 794. NOL1/NOP2/sun family signature 

[1 925] The following protebis seems to be evolutionary related: 

IS . Mammalian proliferating-cell nucleolar antigen pi 20 (gene NOLI) which may play a role in the regulation of the 
cell cycle and the Increased nucleolar activity that is associated with the cell proliferation. 

- Yeast nucleolar protein NOP2 (or YNA1) which could be involved in nucleolar function during the onset of growth, 
and in the maintenance of nucleolar structure. 

Yeast hypothetical protein YBL024w. 
20 - Bacterial protein sun (also known as fmu). 

Escherichia coli hypothetical protein yebU. 

- Mycobacterium tuberculosis hypothetical protein MtCY21 B4.24. 
Methanococcus jannaschii hypothetical protein MJ0026. 

25 NOL1 is a protein of 855 residues, NOP2 consists of 618 residues, YBL024w of 684, sun is a protein of about 430 to 
450 residues and MJ026 has 274 residues. They share a consen/ed central domain which contains some highly con- 
sented regions. One of these regions was selected as a signature pattern. 
[1 926] Consensus pattem[FV]-D-[KRA]-[LI VMA]-L-x-D-[AVl-P-C-[ST)-[G A] 
[1 927] 795. moaA / nif B / pqqE family signature 

30 [1 928] A number of proteins involved in the biosynthesis of metallo cof actors have been shown [1 ,2] to be evolutionary 
related. These proteins are: 

Bacterial and archebacterial protein moaA, which is involved in the biosynthesis of the molybdenum cofactor (mo- 
lybdopterin; MPT). 

35 - Arabidopsis thaliana cnx2, a protein involved in motybdopterin biosynthesis and which is highlys similar to moaA. 
Bacillus subtilis narA, which seems to be the moaA ortholog in that bacteria. 

Bacterial protein nifB (or fixZ) which is involved in the biosynthesis of the nitrogenase Iron-molybdenum cofactor. 

- Bacterial protein pqqE which Is involved In the biosynthesis of the cofactor pyrrolo-quinoline-quinone (PQQ). 
Pyrococcus furiosus cmo, a protein involved in the synthesis of a molybdopterin-based tungsten cofactor. 

40 - Caenorhabditis elegans hypothetical protein F49E2.1 . 

[1929] All these proteins share, In their N-terminal region, a conserved domain that contains three cysteines. In 
moaA, these cysteines have been shown [1] to be Important for the biological activity. They could be inolved in the 
binding of an Iron-sulfur cluster. 
45 [1 930] Consensus pattem[U V]-x(3)-C-[NP]-[LI VMF]-(QRS]-C-x-(FYM]-C [The three C's are putative Fe-S ligands 

[ 1] Menendez C, Igkx G.. Henninger H., Brandsch R. Arch. Microbiol. 164:142-151(1995). 
[ 2] Hoff T, Schnorr K.M., Meyer C, Caboche M. J. Biol. Chem. 270:6100-6107(1995). 

so [1931] 796. Forkhead-associated (FHA) domain profile 

[1932] The forkhead-associated (FHA) domain [1.E1] is a putative nuclear signalling domain found in a variety of 
othenwise unrelated proteins. The FHA domain comprise approximately 55 to 75 amino acids and contains three highly 
consented bkx:ks separated by divergent spacer regbns. Currently it has been found in the following protebis: 

ss - Four transcription factors that also contain a forkhead (FH) domain: mouse myocyte nuclear factor 1 (MNF1 ), yeast 
transcription factor FHL1, which probably controls pre-mRNA processing, and yeast FKH1 and FKH2. In those 
protein the FHA domain is located N-terminal of the DNA-binding FH domain. 

Kinase-associated protein phosphatase (KAPP) from Arabidopsis thaliana, a protein which specifically Interacts 
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with the receptor-type Ser/Thr-kinase RLK5. In KAPR the FHA domain maps to a region that interacts with the 

receptor-type protein kinase RLK5 only if the kinase is phosphorylated on serine residues [2]. 

Two protein kinases from yeast that are involved in mediating the nuclear response to DNA damage: DUN1 and 

SPK1/SAD1 [3]. The latter is the only krrawn protein containing two copies of the FHA domain. 
s - Protein kinase cds1 from fission yeast contains a FHA domain and might be the orthok>g of SPK1 . 

Protein kinase MFK1 from east, whk:h is involved in meiotk: recombination. 

Human nuclear antigen Ki67 whk:h is expressed only in proliferating cells. 

Yeast hypothetk:al protein YHR115c, which contains a RING-finger C-terminal of the FHA domain. 

Yeast hypothetical proteins L8083.1 and 9346.10, which contain an extensive coiled-coil region C-terminal of the 
10 FHA domain. 

Caenorhabditis elegans hypothetical protein ZK632.2. 
- Caenorhabditis elegans hypothetkral protein C01 G6.5. 

FraH from the prokaryote Anabaena, which contains a zinc-finger motif N4erminal of the FHA domain. 

An ORF from the bacterium Streptomyces. whk:h is on the opposite strand of the protein kinase pks1 , overlapping 
IS the ORF of the kinase. 

[ 1] Hofmann K.O.. Bucher P Trends Bkx:hem. Sci, 20:347-349(1995). 

[ 2] Stone J.M., Collinge M.A., Smith R.D.. Hom M.A., Walker J.C. Science 266:793-795(1994). 

[ 3] Navas TA, Zhou Z., Elledge S.J. Cell 80:29-39(1995). 

20 

[1933] 797. Atal_.Xan_dh_C 

AMehyde oxkiase and xanthine dehydrogenase. C terminus 

[1934] [1] Romao MJ, Archer M, Moura I, Moura JJ, LeGall J, Engh R, Schneider M. Hof P. Huber R; Medline: 
96072968 'Crystal structure of the xanthine oxidase-f elated aldehyde oxido-reductase from O. gigas.' Science 1995; 
2S 270:1170-1176. 

Number of members: 54 

[1935] 798. Glyco_hydro_38 
30 Glycosyl hydrolases family 38 

[1 930] Glycosyl hydrolases are key enzymes of cartx)hydrate metabolism. 

Number of rhembers: 20 

35 [1937] [1] Henrissat B; Medline: 98313424; Glycosidase families* Bkx:hem Soc Trans 1998;26:153-156. 
[1938] 799. HECT 
HECT-domaIn (ubk^uitin-transferase). 

[1939] The name HECT comes from Homotogous to the E6-AP Carfooxyt Terminus. 

40 Number of members: 43 

[1940] [1] Huibregtse JM» Scheffner M, Beaudenon S. Howt^ PM; Medline: 95223981; A family of proteins stnjc- 
turatly and functkyially related to the E6-AP ubquitin-protein ligase.* Proc Natl Acad Sci U S A 1995;92:2563-2567. 
[1941] 800. HRDC 
45 HRDC domain 

[1942] The HRDC (Helk:ase and RNase D C-terminal) domain has a putative role in nucleic acid binding. Mutations 
in the HRDC domain cause human disease. 

Number of members: 19 

so 

[1 943] [1 ] Morozov V, Mushegian AR, Koonin E V, Bork P; Medline: 98060076; A putative nucleic acid-binding domain 
in Bkx)m*s and Wemer's syndrome helicases' Trends Bkx:hem Sol 1997;22:417-418. 
[1944] 801. Integrase 

[1945] Integrase mediates Integratbn of a DNA copy of the viral genome Into the host chromosome. Integrase is 
ss composed of three domains. The amino-terminal domain is a zinc binding domain. The central domain is the catalytic 
domain [1].The carboxyl terminal domain is a DNA binding domain [2]. 
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Number of members: 581 
[1946] 

5 [1] Dyda F, Hickman AB. Jenkins TM, Engelman A, Craigie R, Davies DR; Medline: 95099322. Crystal structure 

of the catatytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases." Science 1994;266: 
198M 986. 

[2] Lodi PJ, Ernst JA, Kuszewski J. Hickman AB, Engelman A, Craigie R, Ctore GM. Gronenborn Al^; Medline: 
95359147; Solution structure of the DNA binding domain of HIV-1 integrase." Biochemistry 1995:34:9826-9833 

10 

[1947] 802. Iig_chan 
Ligand-gated ion channel 

[1 948] This family includes the four transmembrane regkxis of the ionotropic glutamate receptors and NMD A recep* 
tors. 

IS 

N umber of members: 1 28 

[1949] [1] Tong G, Shepherd D, Jahr CE; Medline: 95184014; Synaptic desensitization of NMDA receptors by cal- 
cineurin.' Science 1995;267:1510-1512. 
20 [1950] 803, RhoGAP 
RhoGAP domain 

[1951] GTPase activator proteins towards FVK3/Rac/Cdc42-llke small GTPases. 
Number of members: 97 

2S 

[1952] 

[1] Musacchk) A. Gantley LC, Harrison SC; Medline: 97121392; Crystal structure of the breakpoint cluster region- 
homotogy domain from phosphoinositkie 3-kinase p85 alpha subunit." Proc Natl Acad Sci U S A 1996;93: 
30 14373-14378. 

[2] Barrett T, Xiao B, Dodson EJ, Dodson G, Ludbrook SB, Nurmahomed K, Gamblin SJ, Musacchio A. Smerdon 
SJ. Eccleston JF; Medline: 97162209; The structure of the GTPase^ivating domain from p50rhoGAR" Nature 

1997;385:458-461. 

[3] Rittinger K, Wlalker PA, Eccleston JF, Nurmahomed K, Owen D, Laue E. Gamblin SJ, Smerdon SJ; Medline: 
3S 97404320; Crystal structure of a small G protein In complex with the GTPase-acttvating protein riioGAR' Nature 

1997;388:693-697. 

[4] Boguski MS, McCormk:k F; Medline: 94081948; Proteins regulating Ras and its relatives." Nature 1993;366: 
643-654. 

40 [1953] 804. vwd 

von Willebrand factor type D domain 

[1954] [1] Bork P; Medline: 93327926; The modular architecture of a new family of growth regulators related to 
connective tissue growth factor." FEBS lett 1993;^7: 125-1 30. 

4S Number of members: 92 

[1955] 805. zf-C4_Topoisom 
Topoisomerase DNA binding C4 zinc finger 

so [1] Tse-Dlnh YC, Beran-Steed RK; Medline: 89034032; Escherbhia coli DNA topoisomerase I Is a zinc metalk)- 

protein with three repetitive zinc-binding domains," J Biol Chem 1988;263:15857-15859. 
[2] Ahumada A, Tse-Dinh YC; Medline: 99011409; The Zn(ll) binding motifs of E. coli DNA topoisomerase I is part 
of a high-affinity DNA binding donnain." Bkx:hem Bnphys Res Commun 1998;251:509-514. 

ss Number of members: 51 

[1956] 806. AIRC 
AIR cartx>xylase 
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Members of this family catalyse the decartx)xylation of 1-(5-phosphoribosyl)-5-amino-4-im(dazole-carboxylate (AIR). 
This family catalyse the sixth step of de novo purine biosynthesis. Some members of this family contain two copies of 
this domain. Number of members: 35 
[1957] 807. Bromodomain signature and profile 
5 PROSITE cross-reference(s): PS00633; BROMOEX)MAIN_1, PS50014; 
BR0M0D0^4A1N_2 

The bromodomain [1,2,3] is a conserved region of about 70 amino acids found In the following proteins: 

- Higher eukaryotes transcription initiation factor TFIID 250 Kd subunit (TBP-associated factor p250) (gene CCG1 ). 
10 P250 associated with the TFIID TATA-box binding protein and seems essential for progression of the G1 phase 

of the cell cycle. 

- Human RING3, a protein of unknown function encoded in the MHC class II locus. 

- Mammalian CREB-binding protein (CBP), which mediates cAMP-gene regulation by binding specifically to phos- 
phorylated CREB protein. 

15 - Drosophila female sterile homeotic protein (gene fsh); required maternally for proper expression of other homeotic 
genes involved in pattern formation, such as Ubx. 

- Drosophila brahma protebi (gene brm), a protein required for the activation of multiple homeotic genes. 

- Mammalian homologs of brahma In human, three brahma-like proteons are known: SNF2a(hBRM), SNF2b, and 
BRG1. 

20 - Human BS69, a protein that binds to adenovirus El A and inhibits E 1 A transactivation - Human peregrin (or Bri 40). 

- Yeast BDF1 [3], a transcription factor involved in the expression of a broad class of genes including snRNAs. 

- Yeast GCN5. a general transcriptional activator operating in concert with certain other DNA-binding transcriptional 
activators, such as GCN4, HAP2/3/4 or ADA2. 

- Yeast NPS1/STH1 , Involved in G(2) phase control in mitosis. 

2S . Yeast SNF2/SWI2, which is part of a complex with the SNF5. SNF6. SW13 and ADR6/SW11 proteins. This SWI- 
complex is involved in transcriptkyial activation. 

- Yeast SPT7, a transcrv)tk)nat activator of Ty elements and possibly other genes. 

- Caenorhabdrtis elegans protein cbp-1 . 

- Yeast hypothetical protein YGR056w. 
30 - Yeast hypothetcal protein YKROOSw. 

- Yeast hypothetk^al protein L9638. 1 . 

[1958] Some proteins contabi a region whk:h. while similar to some extent to a classical bromodomain. diverges from 
it by either lacking part of the donnain or because of an insertion. These proteins are: 

3S 

- Mammalian protein HRX (also known as All-1 or MLL), a protein involved in translocations leading to acute leuke- 
mias and whk:h possibly acts as a transcriptbnal regulatory factor. HRX contains a region similar to the C- temninal 
half of the bromodomain. 

- Caenorhabditis elegans hypothetk:al protein ZK783.4. The bromodomain of this protein has a 23 amino-ackJ in- 
40 sertlon. 

- Yeast protein YTA7. This protein contains a region with significant similarity to the G-terminal half of the bromodo- 
main. As ft is a member of the AAA family (see <PIX)C00572>) it is also in a functkjnally different context. 

[1 959] The above proteins generally contain a single bromodomain, but some of them contain two copies, this is the 
45 case of BDF1 , CCGI . fsh, RINGS, YKROOSw and L9638. 1 . 

[1 960] The exact f unctkMi of this domain Is not yet krK>wn but it is thought to be involved In protein-protein interactbns 

and It may be important for the assennbly or ^ivlty of multicomponent complexes Involved in transcriptkxial activatbn. 

[1961] The consensus pattern that has been devetoped spans a major part of the bromodomain; a more sensitive 

detection is available through the use of a profile which spans the whole domain. 
so Consensus pattemfSTANVF]-x(2)-F-x(4)-[DNSH(5.7)4DENQTn-Y-[HFY]-x(2)4LIVMFY]-x(3)-[LIVM]-x(4)4^ 

(6.8)-Y-x(1 2.1 3)-[LIVM]-x(2)-N-ISACF]-x(2)-[FY] 

References 
ss [1962] 

[ 1] Haynes S.R.. Doolard C. Winston F.. Beck S.. Trowsdale J., Dawid I B. Nucleic Ackte Res. 20:2693-2603 
(1992). 
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[ 2] Tamkun J.W., Deuring R, Soott M.R, Kissinger M., Pattatucci A.M., Kaufman T.C., Kennison JA Cell 68: 
561-572(1992). 

[ 3] Tamkun J.W. Curr. Opin. Genet. Dev. 5:473-477(1995). 

5 [1963] 808. (CH) Actinin-type actin-binding domain signatures 

PROSITE cross-reference(s): PS00019; ACTININ_1. PS00020; ACTININ_2 

[1964] Alpha-actinin is a F-actin cross-finking protein which is thought to anchoractin to a variety of intracellular 
structures [1]. The actin-binding domain of alpha-actinin seems to reside in the first 250 residues of the protein. A 
similar actin-binding domain has been found in the N-terminal region of many different actin-binding proteins [2,3]: 

10 

In the beta chain of spectrin (or fodrin). 

In dystrophin, the protein defective in Duchenne muscular dystrophy (DMD) and whk:h may play a role in anchoring 
the cytoskeleton to the plasma membrane. 
In the slime mold gelatkxi factor (or ABP-1 20). 
IS - In actin-binding protein ABP-280 (or filamin), a protein that link actin filaments to membrane glycoproteins. 

In fimbrin (or plastin), an actin-bundling protein. Fimbrin differs from the ak)ove proteins in that it contains two 
tandem copies of the actin-binding domain and that these copies are located in the C-terminal part of the protein. 

[1965] Two conserved regions were selected as signature patterns for this type of main. The first of this region is 
20 located at the beginning of the domain, hile the second one is kx:ated in the central section and has been shown to 
be essential for the birKiing of actin. 
[1966] Consensus pattem[EQhx(2)HATV]-(FY]-x(2)-W-x-N 

Consensus pattem[LIVM]-x- [SGN]-[LIVI^]-[DAGHE]-[SAG]-x-[DNEAG]-[LIVM]-x-[DEAGl-x(4)-[UVI^-x-[LM]-[SAG]- 
ILIVI^]-[LIVMT|-W-x- [LIVI^](2) 

2S 

1 1] Schleicher 1^.. Andre E., Harmann A.. Noegel AA. Dev. Genet. 9:521-530(1988). 
[ 2] Matsudaira R Trends Bkx^em. Sci. 16:87-92(1991). 
[ 3] Dubreuit RR. Bk>Essays 13:219-226(1991). 

30 [1967] 809. (COX1 ) Heme-copper (»(idase subunit I, copper B binding region signature PROSITE cross-reference 
(s): PS00077: COX1 

Heme-copper respiratory oxkiases [1] are oligomeric integral membrane protein complexes that catalyze the terminal 
step in the respiratory chain: they transfer electrons from cytochrome c or a quinol to oxygen. Some terminal oxidases 
generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner 
35 membrane (eukaryotes). The enzyme complex consists of 3-4 subunits (prokaryotes) up to 1 3 polypeptides (mammals) 
of which only the catalytk^ subunit (equivalent to mammalian subunit 1 (CO I)) is found in all heme-copper respiratory 
cxkiases. The presence of a bimetallk: center (formed by a high-spin heme and copper B) as well as a low-spin heme, 
both ligated to six consented histkiine reskiues near the outer skie of four transmembrane spans within CO I is common 
to all family menr^ers [2-4]. 

40 [1968] In contrary to eukaryotes the respiratory chain of prokaryotes Is branched to multiple terminal oxidases. The 
enzyme complexes vary in heme and copper composition, substrate type and substrate affinity. The different respiratory 
oxidases alk>w the cells to customize their respiratory systems according a variety of environmental growth conditk)ns 
11]. 

[1 969] Recently also a component of an anaerobk: respiratory chain has been found to contain the copper B binding 
^ signature of this family: nitrk: oxkie reductase (NOR) exists in denitrifying species of Archae and Eubacteria. 
[1 970] Enzymes that bekxig to this family are: 

- Mitochondrial'type cytochrome c oxkiase (EC 1 .9.3. 1 ) which uses cytochrome c as electron donor. Tbe electrons 
are transferred via copper A (Cu(A)) and heme a to the bimetallic center of CO I that is formed by a penta-coor- 

so dinated heme a and copper B (Cu(B)). Subunit 1 contains 12 transmembrane regbns. Cu(B) is said to be ligated 

to three of the conserved histkiine reskiues within the transmembrane segments 6 and 7. 

- Quinol oxidase from prokaryotes that transfers electrons from a quinol to the binuclear center of polypeptkJe 1. This 
category of enzymes includes Escherichia coli cytochrome O terminal oxidase complex whk:h is a component of 
the aerobk: respiratory chain that predominates when cells are grown at high aeration. 

ss - FixN, the catalytic subunit of a cytochrome c oxidase expressed in nitrogen-fixing bacterokJs living in root nodules. 
The high affinity for oxygen allows oxklative phosphorylatkxi under bw oxygen concentratbns. A similar enzyme 
has been found in other purple bacteria. 

- Nitric oxkie reductase (EC 1.7.99.7) from Pseudomonas stutzeri. NOR reduces nitrate to dinitrogen. It is a het- 
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erodimer of norC and the catalytic subunit norB. The latter contains the 6 invariant histidine residues and 12 trans- 
membrane segments [5]. 

[1 971] As a signature pattem the copper-binding region was used. 
5 [1972] Consensus panem[YWGHUVFYWTA](2HVGS}-H-[LNP]-x-V-x(44,47)-H-H [The three H's are copper B lig- 
ands] 

[1 973] Notecytochrome bd complexes do not belong to this family. 
[1] 

10 Garcia-Horsman J.A.. Barquera B., Rumbley J., Ma J.. Gennis R.B. J. Bacterlol. 176:5587-5600(1994). 

[2] 

Castresana J.. Luebben M.. Saraste M.. Higgins D.G. EMBO J. 13:2516-2525(1994). 
[3] 

Capaldi R.A.. Malatesta P.. Darley-Usmar V.M. 
IS Biochim. Biophys. Acta 726:135-148(1983). 

[4] 

Holm L, Saraste M., Wikstrom M. 
EMBO J. 6:2819-2823(1987). 
[5] 

^ Saraste M., Castresana J. 

FEBS Lett. 341:1-4(1994). 

[1974] 810. (dehydrog_molyb) Eukaryolic nrxjlybdopterin oxidoreductases signature PROSITE cross-reference(s): 
PS00559: MOLYBDOPTER(N_EUK 
25 [1 975] A number of different eukaryotic oxidoreductases that require and bind a molybdopterin cofactor have been 
shown [1] to share a few regions of sequence similarity. These enzymes are: 

Xanthine dehydrogenase (EC 1.1.1 .204), which catalyzes the oxidatbn of xanthine to uric acid with the concomitant 
reduction of NAD. Structurally, this enzyme of about 1300 amino acids consists of at least three distinct domains: 
30 an N-terminal 2Fe-2S f erredoxin-like iron-sulfur binding domain (see <PDOC001 75>), a central FAD/NAD-bindtng 

domain arKJ a C-tenminal Mo-pterin domain. 

Aldehyde oxidase (EC 1.2.3.1), which catalyzes the oxidation aldehydes Into acids. Aldehyde oxidase is highly 
similar to xanthine dehydrogenase in its sequence and domain structure. 

Nitrate reductase (EC 1 .6.6. 1 ), which catalyzes the reduction of nitrate to nitrite. Structurally, this enzyme of about 
3S 900 amino acids consists of an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding domain 

(see <PDOC00170>) and a C-terminal FAD/NAI>binding cytochrome reductase domain. 
Sulfite oxidase (EC 1 .6.3.1). which catalyzes the oxidation of sulfite to sulfate. Structurally, this enzyme of about 
460 amino acids consists of an N-terminal cytochrome b5-binding domain followed by a Mo-pterin domain. 

^ [1976] There are a few conserved regions in the sequence of the molybdopterin-binding donnain of these enzymes. 
The pattem uses to detect these proteins is based on one of them. It contains a cysteine residue which could be 
Involved in binding the molybdopterin cofactor. 

[1977] Consensus pattem[GAhx(3)-[KRNQHT>x(11 ,14)-[LIVMFYWS]-x(8)-[UVMF]-x-C-x(2)-IDEN]-R-x(2)-[DE] 
[1] 

45 Wootton J.C.. Nicolson R.E.. Cock J.M.. Walters D.E.. Burke J.R, Doyle 
W.A., Bray R.C. 

Bk)chim. Biophys. Acta 1057:157-185(1991). 

811. (DNAJigase) ATP-dependent DNA tigase signatures 

PROSITE cross-reference(s): PS00697; DNA_LIGASE_A1, PS00333: DNA_LIGASE_.A2 
SO [1978] DNA ligase (polydeoxyribonucleotlde synthase) is the enzyme that joins two DNA fragments by catalyzing 
the formation of an intern ucleotide ester bond between phosphate and deoxyribose. It is active during DNA replication, 
DNA repair and DNA recombinatk>n. There are twofomns of DNA ligase: one requires ATP (EC 6.5.1 .1 ), the other NAD 
(EC 6.5.1.2). 

[1979] Eukaryotic, archaebacterial, virus and phage DNA ligases are ATP-dependent. During the first step of the 
ss joining reaction, the ligase interacts with ATP to fonm a covalent enzyme-adenylate intermediate. A conserved lysine 
residue is the site of adenylation [1 .2]. 

[1980] Apart from the active site regkxi. the only conserved regron common to all ATP-dependent DNA ligases is 
found [3] in the C-terminal section and contains a consented glutamate as well as four positions with consented basic 
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residues. 

[1 981] Signature patterns were developed for both conserved regions. 
[1982] Consensus pattem[EDQH]-x-K-x-{DN]-G-x-R-[GACIVM] [K is the active site residue] 
[1983] a)nsensuspatternE-GHLIVMAHUVM](2)-[KRhx(5,8HYW]HQNEK]-^^^ 
s Sequences known to belong to this class detected by the patternALL, except for archebacterial DNA ligases. 

[1] 

Tomkinson A.E., Totty N.R, Ginsburg M., Lindahl T. 
Proc. Natl. Acad. Sci. U.S.A. 88:400-404(1991). 
10 [2] 

Lindahl T., Barnes D.E. 

Annu. Rev. Brochem. 61:251-281(1992). 

[3] 

Kletzin A. 

IS Nucleic Acids Res. 20:5389-5396(1 992). 

[1 984] 81 2. (FAD_Gly3P_dh) FAD-dependent glycerol-3-pho6phate dehydrogenase signatures PROSITE cross-ref- 
erence(s): PS00977; FAD_G3PDH_1, PS00978; FAD_G3PDH_2 

[1 985] FAD-dependent glycerol-3-phosphate dehydrogenase (EC 1 . 1 .99.5) (G PD) catalyzes the conversion of glyc- 
20 erol-3-phosphate into dihydroxyacetone phosphate. In bacteria [1 ] it is associated with the utilization of glycerol coupled 
to respiration. In Escherk:hia coll, two isozymes are known: one expressed under anaerobic condit»ns (gene glpA) 
and one in aerobk: conditions (gene gtpD). In eukaryotes, a mitochondrial form of GPD partk:ipates in the glycerol 
phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1 .1 .1 .8) [2, 3]. 
[1986] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-binding domain in their N- 
^ terminal extremity. The mammalian enzyme differs from the t>acterial or yeast proteins by having an EF-harKi cateium- 
binding region (See <PDOC00018>) in its C-terminal extremity. 

[1987] Two signature patterns were devek>ped. One based on the first half of the FAD-binding domain and one which 
corresponds to a conserved region In the central part of these enzymes. 
[1988] Consensus pattem[IV]-G-G-G-x(2)-G-[STACVl-G-x-A-x-D-x(3)-R-G 
30 Consensus pattemG-G-K-x(2)-(GSTE]-Y-R-x(2)-A 

[1] 

Austin D., Larson T.J. 
J. Bacteriol. 173:101-107(1991). 
3S [2] 

Roennow B., Klelland-Brandt M.C. 

Yeast 9:1121-1130(1993). 

[3] 

Brown L J., McDonald M.J., Lehn D.A., Moran S.M. 
^ J. Bk>l. Chem. 269: 14363-14366(1994). 

[1969] 813. (Fapy_DNA_glyco) Formamktopyrimkline-DNA glycosylase signature PROSITE cross-reference(s): 
PS01242; FPG 

[1990] Formamidopyrimidine-DNA glycosylase (EC 3.2.2.23) [1] (Fapy-DNA glycosylase) (gene fpg) is a bacterial 
45 enzyme involved In DNA repair and which excise oxklized purine bases to release 2,6-diamino4-hydroxy'5N-methyl- 
formamkJopyrimldlne (Fapy) and 7,8-dihydro-8-oxoguanine (80xoG) residues. In additk>n to its glycosylase activity 
FPG can also nick DNA at apurinic/apyrimkiinic sites (AP sites). FPG is a monomeric protein of about 32 Kd which 
binds and require zinc for its activity. 

[1 991] The binding site for ziru: seems to be kx:ated in the C-terminal part of the enzyme where fours conserved and 
so essential [2] cysteines are located. A signature pattern was developed based on this regk>n. 

[1 992] Consensus pattemC-x(2,4)-C-x-[GTAQ]-x-[l V]-x(7)-R-[GSTAN]-[STA]-x-[FYI]-C- x(2)-C-Q 
[The four C's are putative zinc llgands] 

(1] 

ss Duwat P., de Oliveira R., Ehrlich S.D., Boiteux S. 

Mkirobfology 141:411-417(1995). 
[2] 

O'Connor TE., Graves R.J.. Demurcia G., Castavig B., Laval J. 
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J. Bbl. Chem. 268:9063-9070(1 993). 

[1993] 814. (G^fujranspept) Gamma-glutanrvyltranspeptidase signature PROSITE cross-reference(s): PS00462; 
G_GLU_TRANSPEPTIDASE 

5 [1 994] Gamma-glutamyftranspeptkiase (EC 2.3.2.2) (GGT) [1 ] catalyzes the transfer of the gamma-glutamyl nmiety 
of glutathione to an acceptor that may be an amino acid, a peptide or water (forming glutamate). GGT plays a key role 
in the gamma-glutamyt cycle, a pathway for the synthesis and degradation of glutathione, tn prokaryotes and eukary- 
otes, It is an enzyme that consists of two polypeptide chains, a heavy and a light subunit, processed from a single 
chain precursor. The active site of GGT is krwwn to be located in the light subunit. 

10 [1995] The sequences of mamnnalian and bacterial GGT show a number of regions of high similarity [2]. Pseu- 
domonas cephak)sporln acytases (EC 3.5.1.-) that convert 7-beta-(4-carboxybutanamido)-cephalosporanic ackJ (GL- 
7ACA) into 7-aminocephak>sporanic acki (7ACA) and glutaric acki are evolutbnary related to GGT and also show 
some GGT activity [3]. Like GGT, these GL-7ACA acylases, are also composed of two subunits. 
[1996] One of the consented regions correspond to the N-terminal extremity of the mature light chains of these 

IS enzymes. This region was used as a signature pattern. 

[1 997] Consensus patlemT-ISTAJ-H-x-[STJ-lU VMA]-x(4)-G-[SN]-x-V-[STA]-x-T-x-T-[U VM]-[NEhx(1 ,2)-[F Y]-G 

[1] 

Tate S.S., Meister A. 
20 Meth. Enzymol. 113:400-419(1985). 

[2] 

Suzuki H., Kumagal H.. EchlgoT, Tochlkura T 

J. Bacteriol. 171:5169-5172(1989). 

[3] 

2S Ishiye M., Niwa M. 

Biochim. Biophys. Acta 1132:233-239(1992). 

[1998] 815. G-protein gamma subunit profile 

PROSITE cross-reference(s): PS50058; G_PROTEIN_GAMMA 

30 [1999] Guanine nucleotide-binding proteins (G proteins) [1] act as intermediaries in the transduction of signals gen- 
erated by transmembrane receptors. G proteins consist of three subunits (alpha, beta, and gamma). The alpha subunit 
binds to and hydrolyzes GTP; the f unctk)ns of the beta and gamma subunits are less clear but they seem to be required 
for the replacement of GDP by GTP as well as for membrane anchoring and receptor recognition. 
[2000] The gamma subunits are small proteins (from 70 to 110 resklues) that are bound to the membrane via a 

3S isoprenyl group (either a famesyl or a geranylgeranyl) covalently linked to their C^erminus. in mammals there are at 
least 1 2 different isoforms of gamma subunits. 

[2001] The Caenorhabditis elegans protein egl-10, whk:h is a regulator of G-protein signalling, contains a G-protein 
^mnna-like domain. 

[2002] A profile was deveksped that spans the complete length of the gamma subunit. 

40 [1] 

Pennington S.R. 
Protein Prof. 2:16^15(1995). 
[2003] 816. GNS1/SUR4 family signature 
PROSITE cross-reference(s): PS01188; GNS1_SUR4 
45 [2004] The following group of eukaryotic integral membrane proteins, whose exact function has not yet clearly been 
established, are evolutionary related [1]: 

- Yeast GNS1 [2], a protein involved in synthesis of 1.3-beta-glucan. 

- Yeast SUR4 (or APA1 , SRE1 ) [3], a protein that could act In a glucose^ignaling pathway that controls the expres- 
so sion of several genes that are trar^scriptksnally regulated by glucose. 

Yeast hypothetk:al protein YJL196c. 

- Caenorhabditis elegans hypothetk:al protein C40H1 .4. 

- Caenorhabditis elegans hypothetical protein D2024.3. 

ss [2005] The proteins have from 290 to 435 amino acid residues. Structurally, they seem to be formed of three sectk>ns: 
a N-terminal regbn with two transmerr^rane domains, a central hydrophilk: loop and a C-terminal regk>n that contains 
from one to three transmembrane domains. A consen/ed region that contains three histkiines was selected as a sig- 
nature pattem. This region is bcated in the hydrophilic kx)p. 
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Consensus pattemL-x-F-L-H-x-Y-H-H 
[1] 

Bairoch A. 

5 Unpublished observations (1 996). 

[2] 

El-SherbeinI M.. Clemas JA 

J. Bacterlol. 177:3227-3234(1995). 

[3] 

10 Garcla-Arranz ly^.. Maldonado A.M.. Mazon M.J.. Portilb F. 

J. Biol. Chem. 269:18076-18082(1994). 

[2006] B1 7. Inimunoglobulins and major histocompatibility complex proteins signature PROSITE cross-reference(s): 
PS00290; IG_MHC 

IS [2007] The basic structure of immunoglobulin (Ig) [1 ] molecules is a tetramer of two light chains and two heavy chains 
linked by disulfide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain 
(CL) and a variable domain (VL). There are five types of heavy chains: alpha, delta, epsik>n, gamma and mu, all 
consisting of a variable domain (VH) and three (in alpha, delta and gantma) or four (In epsilon and mu) constant domains 
(CHI to CH4). 

20 [2008] The major histocompatibility complex (MHC) molecules are made of two chains. In class I [2] the alpha chain 
is composed of three extracellular domains, a transmembrane region and a cytoplasmic tail. The beta chain (beta- 
2-microgk3bulin) is composed of a single extracellular domain. In class II [3], both the alpha and the beta chains are 
composed of two extracellular domains, a transmembrane region and a cytoplasmic tail. 

[2009] It is known [4,5] that the Ig constant chain domains and a single extracellular domain in each type of MHC 
2S chains are related. These homologous domains are approximately one hundred amino acids tong and include a con- 
served intradomain disulfide bond. A small pattem around the C-terminal cysteine is Involved in this disulfide bond 
which can be used to detect these category of Ig related proteins. 

[2010] Consensus patlem[FY]-x-C-x-[VA]-x-H-Sequences known to betong to this class detected by the pattem: Ig 
heavy chains type Alpha C region : All, in CH2 and CHS. Ig heavy chains type Delta C region : All, in CH3. Ig heavy 
30 chains type Epsilon C region: All, in CH1 , CH3 and CH4. Ig heavy chains type Gamma C region : All, in CHS and also 
CHI in some cases Ig heavy chains type Mu C regkxi : All, in CH2. CH3 and CH4. Ig light chains type Kappa C regkxi : 
In all CL except rabbit and Xenopus. Ig light chains type Lambda C region : In all CL except rabbit. MHC class I alpha 
chains : 

All. in alpha-3 domains, including in the cytomegalovirus MHC-1 honrK)togous protein [6]. Beta-2-microgk)bulln : All. 
3S MHC class II alpha chains: All, In alpha-2 domains. MHC class II beta chains: All, in beta-2 domains. 

[1] 

Gough N. 

Trends Bkx:hem. Sci. 6:203-205(1981). 
40 [2] 

Klein J., Figueroa F. 
Immunol. Today 7:41-44(1986). 
[3] 

Figueroa F., Klein J. 
45 Immunol. Today 7:7M1(1986). 

[4] 

Orr H.T, Lancet D., Robb R.J., Lopez de Castro J.A., Strominger J.L. 

Nature 282-266-270(1979). 

15] 

SO Cushley W. , Owen M.J. 

Immunol. Today 4:88-92(1983). 
[6] 

Beck S.. Barrel B.G. 
Nature 331:269-272(1988). 

55 

[2011] 818. (IGFBP) Insulin-like growth factor binding proteins signature PROSITE cross-reference(s): PS00222; 
IGF_BIND1NG 

[2012] The insulin-like growth factors (IGF-I and IGF-ll) bind to specific binding proteins in extracellular fluids with 
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high affinity [1 ,2.3]. These IGF-binding proteins (IGFBP) prolong the half-life of the IGFs and have been shown to either 
inhibit or stimulate the growth promcMing effects of the IGFs on cells culture. They seem to alter the interaction of IGFs 
with their cell surface receptors. There are at least six different IGFBPs and they are structurally related. 
[2013] The following growth-factor inducible proteins are stmcturally related to IGFBPs and could function as growth- 
s factor binding proteins [4,5): 

Mouse protein cyr61 and its probable chicken homolog, protein CEF-10. 
- Human connective tissue growth factor (CTGF) and its mouse homolog, protein FISP-12. 
Vertebrate protein NOV. 

10 

[2014] As a signature pattern a conserved cysteine-rich region locatedin the N-terminal section of these proteins is 
used. 

[201 SI Consensus pattemG-C-[GS]-C-C-x(2)-C-A-x(6)-C 

Sequences known to bekxig to this class detected by the patternALL. except for IGFBP-6*s. 

15 

II] 

Rechler M.M. 

Vitam. Horm. 47:1-114(1993). 
[2] 

50 Shimasaki S., Ling N. 

Prog. Growth Factor Res. 3:243-266(1991). 
[3] 

Clemmons D.R. 

Trends Endocrinol. Metab. 1:412-417(1990). 
2S [4] 

Bradham D.M., Igarashi A., Potter R.L.. Grotendorst G.R. 

J. Cell B»l. 114:1285-1294(1991). 

[5] 

Matoisel V, Martinerie C, Dambrine G., Plassiart G.. Brisac M., Crochet 
30 J., Perbal B. 

Mol. Cell. Biol. 12:10-21(1992). 



[2016] 819. LMWPc: Low molecular weight phosphotyrosine protein phosphatase 
35 Number of members: 34 

[2017] [1]Medline: 94329182, The crystal structure of a tow-mdecular-weight phosphotyrosine protein phosphatase. 

Su XD, Taddei N. Stefani M. Ramponi G, Nordlund P; Nature 1994;370:575-578. 

[2018] 820. (myosln_head) ATP/GTP-binding site motif A (P-kx)p) 

PROSITE cross-reference(s): PS00017: ATP_GTP_A 
40 [2019] From sequence comparisons and crystal tographic data analysis it has been shown [1 ,2,3,4,5,6] that an ap- 
preciable proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence nrtotifs. 

The best conseryed of these motifs is a glycine-rich regnn, which typk:ally forms a flexible loop between a beta-strand 

and an alpha-helix. This loop Interacts with one of the phosphate groups of the nucleotide. This sequence motif is 

generally referred to as the W consensus sequence [1] or the P-kxjp" [5]. 
45 [2020] There are nunnerous ATP- or GTP-binding prcrteins in which the P-loop is found. A number of protein families 

for whk:h the relevance of the presence of such motif has been noted is listed below: 

- ATP synthase alpha and beta subunits (see <PDOC001 37>). 
Myosin heavy chains. 

so - Kinesin heavy chains and kinesin-like proteins (see <PDOC00343>). 
Dynamins and dynamin-like proteins (see <PDCX)00362>). 
Guanylate kinase (see <PDOC00670>). 

- Thymkfine kinase (see <PDOC00524>). 

- ThymkJyIate kinase (see <PDOC01034>). 
ss - Shikimate kinase (see <PDOC00868>). 

- Nitrogenase iron protein family (nifH/fwC) (see <PDOC00580>). 

ATP-binding proteins involved in 'active transport' (ABC transporters) [7] (see <PDOC00185>). 

- DNA and RNA helicases [8,9,10]. 
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- GTP-binding elongation factors (EF-Tu. EF-lalpha, EF-G, EF-2, etc.). 
Ras family of GTP-binding proteins. (Ras. Rho. Rab, Ral» Y|pt1, SEC4, etc.). 
Nuclear protein ran (see <PCXX^00859>). 

- ADP-ribosylation factors family (see <PDOC00781 >). 
5 - Bacterial dnaA protein (see <PDCX^771 >). 

Bacterial recA protein (see <PDOC00^3^ >). 
Bacterial recF protein (see <PDOC00539>). 

Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, GO, etc.). 
DNA mismatch repair proteins mutS family (See <PDOC00388>). 
10 - Bacterial type II secretion system protein E (see <PDOC00567>). 

[2021] Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of proteins escape detection 
because the structure of their ATP-binding site is completely different from that of the P-loop. Examples of such proteins 
are the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a 
IS slightly different form; this is the case for tubulins or protein kinases. A special mention must be reserved for adenylate 
kinase, in whbh there is a single deviatron from the P-loop pattern: in the last position Gly is found instead of Ser or Thr. 
[2022] Consensus pattem[AG]-x(4)-G-K-[ST| 

[1] 

20 Walker J.E.. Saraste M.. Runswick M.J., Gay N.J. 

EMBO J. 1 :945-951 (1 982). 
[2] 

Moller W., Amons R 
FEBSLett, 186:1-7(1985), 
25 [3] 

Fry D.C.. Kuby S.A., l\4iklvan A.S. 

Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). 

[4] 

Dever T.E., Glynias M.J., Merrick W.G. 
30 Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). 

[5] 

Saraste M., Sibbaki PR., Wittlnghofer A. 
Trends Bk)chem. Sci. 15:430-434(1990). 
[6] 

3S Koonin E.V 

J. Mol. Biol. 229:1165-1174(1993). 
[7] 

Higgins G.F., Hyde S C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher M.P. 
J. Bbenerg. Bk)membr. 22:571-592(1990). 
40 [8] 

Hodgman T.C. 

Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). 
[9] 

Under P., Lasko P, Ashbumer M., Leroy P., Nielsen P.J.. Nishi K., 
45 Schnier J. . Slonimski P. P 

Nature 337:121-122(1989). 
[10] 

Gorbalenya A.E.. Koonin E.V., Donchenko A.P., Blinov V.M. 
Nuciek; Acids Res. 17:4713^730(1989). 

so 

[2023] 821. PE:PE family 

This family named after a PE rrK)tif near to the amino terminus of the domain. The PE family of proteins all contain an 
amino-terminal region of about 110 amino ackis. The carboxyl terminus of this family are variable and fall into several 
classes. The largest class of PE proteins is the highly repetitive PGRS class which have a high glycine content. The 
ss f unctkxi of these proteins is uncertain but it has been suggested that they may be related to antigenic variatkx) of 
Mycobacterium tuberculosis [1]. Number of members: 86 

[2024] [1] Medline: 98295987. Deciphering the bk>k)gy of Mycobacterium tuberculosis from the complete genome 
sequence. Cole ST, Brosch R. Parkhill J, Gamier T, Churcher C, Harris D, Gordon SV, Eiglmeier K. Gas S, Barry CE 
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3rd, Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R. Davles R, Devlin K, Fettwell T, Gentles S, 
Hamlin N. Holroyd S. Homsby T. Jagels Barrell BG, et al; h4ature 1998;393:537-544. 
[2025] 822. (RNB) RIbonuclease II family signature 
PROSITE cross-reterence(s): PS01175; RIBONUCLEASEJI 
s [2026] On the basis of sequence similarities, the following bacterial and eukaryotk; proteins seern to form a family: 

Escherichia coli and related t^acteria ribonuclease 11 (EC 3.1.13.1) (RNase II) (gene mb) [1]. RNase II is an exo- 
nuclease involved in mRNA decay. It degrades mRNA by hydrolyzing single-stranded polyribonucleotkles proces- 
sively in the 3 to 5' direction. 

10 ' Bacterial protein vacB. In Shigella flexneri, vacB has been shown to be required for the expressk)n of virulence 
genes at the posttranscriptk>nal level. 

Yeast protein SSD1 (or SRK1) whk:h is implcated in the control of the cell cycle G1 phase. 
Yeast protein DIS3 [2], whk:h binds to ran (GSP1 ) and ehances the the nucleotide-releasing activity of RCC1 on ran. 
Fission yeast protein dis3. which is Implicated in mitotic control. 
IS . Neurospora crassa cyt-4, a mitochondrial protein required for RNA 5' and 3' end processing and spteing. 
Yeast protein MSU1 , whk:h is involved in mItochorKirial biogenesis. 

- Synechocystis strain PCC 6803 protein zam [3], which control resistance to the carbonc anhydrase inhibitor aceta- 
zolamlde. 

Caenorhabditis elegans hypothetical protein F48E8.6. 

20 

[2027] The size of these proteins range from 644 residues (mb) to 1250 (SSD1). While their sequence is highly 
divergent they share a conserved domain in their C-terminal section [4]. It is possible that this domain plays a role in 
a putative exonucfease f unctkm tfiat wouki be common to all these proteins. A signature pattern was developed based 
on the core of this conserved domain. 
2S [2028] Consensus patlem[HIJ^FYE]^GSTAM]^UVM^x(4,5)-Y-[STAL]-x^FVmC]-[TV]^SA] 
[FY]-x-D-x(3)-[HQ] 

[1] 

Zilhao R., Camelo L, Arraiano CM. 
30 Mol. Mcrobiol. 8:43-51 (1 993). 

[2] 

Noguchi E., Hayashi N., Azuma Y, Seki T. f^kamura M.. Nakashima N., 
Yanagkla M., He X., Mueller U., Sazer S.. Nishimoto T. 
EMBO J. 15:5595-5605(1996). 
35 [3] 

Beuf L. Bedu S., Cami B.. Joset F. 
Plant Mol. Bk)l. 27:779-788(1995). 
[4] 

Mian I.S. 

40 Hvcie\c Acids Res. 25:3187-3195(1997). 

[2029] 823. Src homok>gy 2 (SH2) domain profile 
PROSITE cross-reference(s): PS50001; SH2 

[2030] The Src homobgy 2 (SH2) domain Is a protein domain of about 100 aminp^cid reskJues first kientified as a 
45 consented sequence region between the oncoproteins Src and Fps [1]. Similar sequences were later found In many 
other Intracellular signal-transducing proteins [2]. SH2 domains function as regulatory nrKXiules of intracellular signalling 
cascades by Interacting with high affinity to phosphotyrosine-contalning target peptides in a sequence-specifk: arxJ 
strrctty phosphorylatkxi-dependeni manner [3,4,5,6]. 

[2031] The SH2 domain has a consent 3D structure consisting of two alpha helk:es and six to seven beta-strands. 
so The core of the domain is formed by a continuous beta-meander composed of two connected beta-sheets [7]. 
[2032] So far, SH2 domains have been identified in the folbwing proteins: 

Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) protein tyrosine kinases. In particular in 
the Src, Abl, Bkt, Csk and ZAP70 families of kinases. 
ss - Mammalian phosphatkJyIlnositol-specific phospholipase C gamma- 1 and -2. Two copies of the SH2 domain are 
found in those proteins in between the catalytic 'X- and 'Y-boxes'(see <PDOC50007>). 

- Mammafian phosphatkJyl inositol 3-kinase regulatory p85 subunit. 

- Some vertebrate and invertebrate protein-tyrostne phosphatases. 
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- Mammalian Ras GTPase-activating protein (GAP). 

Adaptor proteins mediating binding of guanine nucleotide exchange factors to growth factor receptors: vertebrate 
GR^, Caenorhabditis elegans sem-5 and Drosophila DRK. 

Mammalian Vav oncoprotein, a guanine-nucleotide exchange factor of the CDC24 family. 
s - Miscelianous proteins interacting with vertebrate receptor protein tyrosine kinases: oncoprotein Crk, mammalian 
cytoplasmic proteins Nek, She. 

STAT proteins (signal transducers and activators of transcription). 
Chicken tensin. 

Yeast transcriptional control protein SPT6. 

10 

[2033] The profile developed to detect SH2 domains is based on a structural alignment consisting of 8 gap4ree 
blocks and 7 linker regk)ns totaling 92 match positk)ns. 

[11 

IS Sadowski L. Stone J.C., Pawson T. 

Mot. Cell. BioL 6:439&4408(1986). 
[21 

Russel R.B., Breed J.. Barton G.J. 
FEBS Lett. 304:15-20(1992). 
20 [3] 

Marangere LE.M., Pawson T. 

J. Cell Sci. Suppl. 18:97-104(1994). 

[41 

Pawson T, Schlessinger J. 
2S Curr. Biol. 3:434-442(1993). 

[51 

Mayer B.J., Baltimore D. 
Trends Cell. Biol. 3:8-13(1993). 
[61 

30 Pawson T. 

tMature 373:573-580(1995). 
[7] 

Kuriyan J.. Cowburn D. 

Curr. Opin. Struct. Bk)L 3:828-837(1993). 

55 

[2034] 824. Sulfate transporters signature 

PROSITE cross-reference(s): PS01130; SULFATE.TRANSP 

[2035] A number of proteins involved in the transport of sulfate across a membrane as well as some yet uncharac- 
tertzed proteins have been shown [1 ,2] to be evolutionary related. These proteins are: 

40 

Neurospora crassa sulfate permease II (gene cys-14). 
Yeast sulfate permeases (genes SUL1 and SUI^). 
Rat sulfate anion transporter 1 (SAT-1). 

Mammalian DTDSX a probable sulfate transporter whk:h, in Human, is involved in the genetic disease, diastrophk; 
45 dysplasia (DTD). 

Sulfate transporters 1 , 2 and 3 from the legume Stylosanthes hamata. 

Human pendrtn (gene PDS), whk:h is involved In a number of hearing loss genetk: diseases. 
Human protein DRA (IDown-Regutated in Adenoma). 
so - Soybean early nodulin 70. 

Escherichia colt hypothetk:al protein ychM. 

- Caenorhabditis elegans hypothetk:al protein F41 D9.5. 

[2036] As expected by their transport f unctk)n, these proteins are highly hydrophobe and seem to contain about 1 2 
ss transmembrane domains. The best consen/ed region seems to be located in the second transmembrane regkxi and 
is used as a signature pattem. 

[2037] Consensus patlem[PAVl-x-Y-[GS]-L-Y4STAG](2)-x(4)-[LIWYA].[UVST]-[YI]-x(3)-[GAHGST]-^ 



300 



EP1 033 405A2 



[11 

Sandal N.N.. Marcker KA 

Trencte Biochem. Sci. 19:19-19(1994). 

[2] 

5 Smith RW., Hawkesford M J., Prosser I.M., Clarkson D.T. 

Mol. Gen. Genet. 247:709-715(1995). 

[2038] 825. TYA: TYA transposon protein 

Ty are yeast transposons. A 5.7kb transcript codes for p3 a f uskxi protein of TYA and TYB. The TYA protein is anabgous 
10 to the gag protein of retroviruses. TYA a is cleaved to form 46kd protein which can form mature virion like particles [1 ]. 
Number of members: 59 

[2039] [1] Medline: 97404699. Cryo-electron mcroscopy structure of yeast Ty retrotransposon virus-like particles. 
Palmer KJ, Tichelaar W. Myers Burns NR, Butcher SJ, Kingsman AJ. Fuller SD, Saibil HR; J Virol 1997;71: 
6863-6868. 
IS [2040] 826. AklolaseJI 

Class II Aktolase and Adductn N-tenminal domain. 

-I- This family includes class II aktolases and adducins whk:h have not been ascribed any enzymatic function. Number 
of members: 37 

20 

References: 
[2041] 

2S [1 ] Medline: 9329481 9. The spatial structure of the class II L4 ucuk)6e-1 -phosphate aldolase from Escherichia coll. 

Dreyer MK, Schuiz GE; J Mol Biol 1993;231:549-553. 

[2] Medline: 96256522. Catalytk: mechanism of the metal-dependent fucuk>se aldolase from Escherbhia coli as 
derived from the stmcture. Dreyer MK. Schuiz GE; J Mol Biol 1996;259:458-466. 

30 [2042] 827.CBD_2 

-I- Two tryptophan reskJues are involved in cellutose binding. 

-I- Cellutose binding domain found in bacteria. Number of members: 51 

3S References: 

[2043] [1] Medline: 95284032. Solutbn structure of a cellulose-binding domain from Cellulomonas fimi by nuclear 
magnetic resonance spectroscopy Xu GY Ong E. Gilkes NR. Kilbum DG. Muhandiram DR. Harris-Brandts M, Gan/er 
JP, Kay LE, Han/ey TS; Bkxjhemistry 1995;34:6993-7009. 
40 [2044] 828. P 

A unique feature of the eukaryotic subtilisin-like proprotein convertases is the presence of an additional highly con- 
served sequence of approximately 150 residues (P domain) kx:ated immediately downstream of the catalytic domain. 
Number of members: 91 

45 References: 

[2045] 

[1] Medline: 94252314. A C4enninal domain consented in precursor processing proteases is required for intramo- 
so lecular N-terminal maturatk)n of pro-Kex2 protease. Gluschankof P, Fuller RS; EMBO J 1 994;1 3:2280-2288. 

[2] Medline: 98225190. Regulatory roles of the P domain of the subtilisin-like prohormone convertases. Zhou A, 
Martin 8. Upkind G. LaMendola J. Steiner DF; J Bk>l Chem 199B;273:11 107-1111 4. 

[2046] 829. Uncharacterized protein family UPF0020 signature 
SS PROSITE cross-ref erence(s): PS01 261 ; UPF0020 

The foltowing uncharacterized proteins have been shown [1] to share regions of similarities: 

- Escherk:hia coli hypothetk:al protein ycbY and HI0116/15. the corresponding Haemophilus influenzae protein. 
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Bacillus subtilis hypothetical protein ypsC. 

- Synechocystis strain PCC 6803 hypothetical protein slr0064. 
Methanocoocus jannaschii hypothetical proteins MJ0438 and MJ0710. 

s [2047] These are hydrophilic proteins of from 40 Kd to about 80 Kd. They can be picked up in the database by the 
following pattern. 

[2048] Consensus pattemD-P-[U Vf^F]-C-G-[ST|-G-x(3)-[LI]-E 

References: 

10 

[2049] [ 1 ] Bairoch A. Unpublished observations (1 997). 
[2050] 830. Uncharacterized protein family UPF0031 signatures 

PROSITE cross-reference(s): PS01049; UPF0031_1; PS01050; UPF0031_2 The following uncharacterized proteins 
have been shown [1] to share regions of similarities: 

IS 

- Yeast chromosome XI hypothetical protein YKLISIc. 
CaenorhabditB elegans hypothetical protein R107.2. 
Escherichia coll hypothetical protein yjeP. 

Bacillus subtilis hypothetical protein yxkO. 
20 - Helicobacter py tori hypothetical protein HP1 383. 

- Mycobacterium tuberculosis hypothetical protein h4tCY77.05c. 
Mycobacterium leprae hypothetical protein B229.C2_201 . 

- Synechocystis strain PCC 6803 hypothetical protein slI1433. 
Methanococcus jannaschii hypothetical protein MJ1 586. 

25 

[2051] These are proteins of about 30 to 40 Kd whose central region Is well consenred. They can be picked up in 
the database by the folbwing patterns. 

[2052] Consensus pattem[SAVHIVW]-(LVAHLIV]-G-[PNS]-G-L-[GP]-x-[DENQT| 
Consensus pattem{GA]-G-x-G-D-[TV]-[Lt|-lSTA]-G-x-[LIVM] 
30 [2053] 831. (ACOX) 
Acyl-CoA oxkiase 

[2054] This is a family of Acyl-CoA oxidases EC: 1 .3.3.6. Acyl-coA oxidase converts acyl-CoA into trans-2-enoyl-CoA 
[1]- 

3S Number of members: 39 

[2055] [1] Hayashi H. De Bellis U Yamaguchi K. Kato A. Hayashi M, Nishimura M; Medline: 98192624. Molecular 
characterization of a glyoxysomal bng chain acyl-CoA oxklase that is synthesized as a precursor of higher molecular 
mass In pumpkin.* J Biol Chem 1998;273:8301-8307. 
40 [2056] 832. (AlCARFTJMPCHas) 
AlCARFT/IMPCHase bienzyme 

[2057] This is a family of bif unctk)nal enzymes catalysing the last steps in de novo purine biosynthesis. The bif unc- 
tbnal enzyme is found in both prokaryotes and eukaryotes. The second last step is catalysed by 5-aminoimklazole- 
4-cart)oxamide ribonucleotide fonmyltransterase EC5.1.2.3 (AICARFT). this enzyme catalyses the formylatlon of AlC- 
45 AR with 10-formyl-tetrahydrofolate toyieW FAICAR andtetrahydrofolate [1]. The last step Is catalysed by IMP (Inoslne 
nrK)nophosphate)cyclohydrolase EC:3.5.4.10 (IMPCHase), cyclizing FAICAR (5-fonmylaminoimidazole-4-carboxamkJe 
ribonucleotide) to IMP [1]. 

Number of members: 22 

so 

[2058] 

[1] Akira T, Komatsu M, Nango R, Tomooka A, Konaka K. Yamauchi M. Kitamura Y, tstomura S. Tsukamoto I; 
Medline: 97473523 Molecular ctoning and expresston of a rat cDNA encoding 5-aminoimidazole-4-carboxamkie 
ss ribonucleotkJe formyltransf erase/I MP cyck)hydrolase" [published erratum appears in Gene 1998 Feb 27:208(2): 

337] Gene 1997;197:289-293. 

[2] Rayl EA. Moroson BA. Beardsley GP; Medline: 96147205 The human purH gene product, 5-aminoimidazole- 
4-carboxamlde ribonucleotide formyflransfferase/IMP cyctohydrolase. Cloning, sequencing, expression, purifica- 
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tion, kinetic analysis, and cfomain mapping.' J Biol Chem 1996;271:2225-2233. 

[2059] 833. (AOX) 
Alternative oxidase 

s [2060] The alternative oxidase is used as a second terminal oxidase in the mitochondria, electrons are transfered 
directly from reduced ubiquinol to oxygen forming water [2]. This is not coupled to ATP synthesis and is not Inhibited 
by cyanide, this pathway is a single step process [1 ]. In rice the transcript levels of the alternative oxidase are increased 
by low temperature [1]. 

10 Number of members: 27 

[2061] 

[1] Ito Y Saisho D, Nakazono M, Tsutsumi N. Hirai A; ^4edline: 98086211 Transcript levels of tandem-arranged 
IS alternative oxidase genes in rice are increased by low temperature.' Gene 1997:203:121-129. 

[2] Li Q, Ritzel R6, McLean LL, Mcintosh L. Ko T, Bertrand H, Nargang FE; Medline: 9636641 3 Cloning and analyst 
of the alternative oxkiase gene of Neurospora crassa.' Genetk:s 1996;142:129>140. 

20 [2062] 834. (APH) 

Protein kinases signatures and profile 

[2063] Cross-reference(s): PS00107; PROTEIN_KINASE_ATP, PS00108; 
PROTEIN_KINASE^SX PS00109; PROTEIN_KINASE_TYR. PS50011; 
P ROTE I N^KI N ASE_DOM 

2S [2064] Eukaryotic protein kinases [1 to 5] are enzymes that belong to a very extensive family of proteins which share 
a consented catalytk: core common to both serine/threonine and tyrosine protein kinases. There are a number of 
consented regk>ns in the catalytc domain of protein kinases. Two of these regk)ns have been selected to buiki signature 
patterns. TTie first region, whfeh is k«ated in the N-terminal extremity of the catalytic domain, is a glycine-rich stretch 
of resklues in the vicinity of a lysine residue, which has been shown to be involved in ATP binding. The second region, 

30 which is located in the central part of the catalytk; domain, contains a consen/ed aspartic acd residue whk:h is important 
for the catalytic activity of the enzyme [6]; two signature patterns were derived for that regbn: one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was devebped which is based on the alignment in [1] 
and covers the entire catalytic domain. 

[2065] Consensus pattern: [LIVl-G-{PhG-{PHFYWMGSTNH]-(SGA]-{PW}-[LIVCAT]-{PD}-x- [GSTACLIVMFY]-x 
35 (5, 1 8)-[LI VMFYWCSTARHAIVPHU VMFAGCKRJ-K [K binds ATP] 

[2066] Sequences known to belong to this class detected by the pattern the majority of known protein kinases but it 
fails to find a number of them, especially viral kinases which are quite divergent in this regk>n and are completely 
missed by this pattern. 

[2067] Consensus pattern: [UVMFYC]-x-[HY]-x-D-[LIVMFY]-K-x(2)-'N-[LIVMFYCT](3) [D is an active site resWue] 
40 [2068] Sequences known to betong to this class detected by the pattem. Most serine/ threonine specific protein 
kinases with 10 exceptkxts (half of them viral kinases) and also Epstein-Barr virus BGLF4 and Drosophila ninaC which 
have respectively Ser and Arg instead of the conserved Lys and whk:h are therefore detected by the tyrosine kinase 
specific pattem described below. 

[2069] Consensus pattem: [UVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N-[LI VMFYC](3) [D Is an active site res- 
45 idue] tyrosine specific prc^ein kinases with the exceptbn of human ERBB3 and mouse bik. This pattem will also detect 
most bacterial aminoglycoside phosphotrartsf erases [8,9] and herpesviruses ganciclovir kinases [10]; which are pro- 
teins structurally and evolutionary related to protein kinases. Sequences known to belong to this class detected by the 
profile ALL, except for three viral kinases. This prc^le also detects receptor guanylate cyclases (see <PDOC00430>) 
and 2-5A-dependent ribonucleases. Sequence similarities between these two families and the eukaryotic protein kinase 
so family have been noticed before. It also detects Arabktopsis thaliana kinase- like protein TMKL1 which seems to have 
lost its catalytk: activity. 

[2070] Note if a protein analyzed includes the two protein kinase signatures, the probability of it being a protein kinase 
is cbse to 100%. Note eukaryotk:-4ype protein kinases have also been found in prokaryotes such as Myxococcus 
xanthus [1 1 ] and Yersinia pseudotubercuk)sis. Note the patterns shown above has been updated since their publication 
ss in [7]. Note this documentation entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive than the patterns, you shouki use it if you have access to the necessary software tools to do so. 



303 



EP 1 033 405 A2 



References 
[2071] 

5 [ 1] Hanks S.K., Hunter T., FASEB J, 9:576-596(1995). 

[ 2] Hunter T. Meth. Enzymol. 200:3>37(1991). 

[ 3] Hanks S.K.. Quinn A.M., Meth. EnzymoL 200:38-62(1991). 

[ 4] Hanks S.K., Curr Opin. Struct. Bk>l. 1:369-383(1991). 

[5] Hanks S.K.. Quinn A.M.. Hunter T. Science 241:42-52(1988). 
10 1 6] Knighton D.R., Zheng J., Ten Eyck LR, Ashford V. A., Xuong N.-H.. Taytor. S.S.. Sowadski J.M.. Science 253: 

407-414(1991). 

[ 7] Bairoch A.. Claverie J.-M.. Nature 331:22(1988). 
1 8] Benner S., Nature 329:21-21(1987). 
[ 9] Kirby R., J. Mol. Evol. 30:489-492(1992). 
IS [10] Littler E., Stuart A.D., Chee M.S., Nature 358:160-162(1992). 

[11] Munoz-Dorado J.. Inouye S.. Inouye M.. Cell 67:995-1006(1991). 

[2072] 835. (Asp_Glu_race) 
Aspartate and gtutamate racemases signatures 
20 [2073] Cross-reference(s) PS00923; ASP_GLU_RACEMASE_1 PS00924; 
ASP_GLU_RACEMASE„2 

[2074] Aspartate racemase (EG 5. 1 .1 .1 3) and glutamate racemase (EC 5. 1 .1 .3) are two evolutk>nary related bacteria! 

enzymes that do not seem to require a cofactor for their activity [1]. Glutamate racemase, which interconverts L-gluta- 

mate into D-glutamate, is required tor the bbsynthesis of peptidoglycan and some peptide-based antibbtics such as 
2S gramicidin S. In addition to characterized aspartate and glutamate racemases, this family also includes a hypothetical 

protein from Erwinia carotovora and one from Escherk:hia coll (ygeA). Two conserved cysteines are present in the 

sequence of these enzymes. They are expected to play a role In catalytic activity by acting as bases in proton abstraction 

from the sut)strate. Signature patterns were developed for both cysteines. 

[2075] Consensus pattern: [I VA]-[U VM]-x-C-x(0, 1 )-N-[ST]-[MSAHSTH]-[LI VFYSTANKJ 
30 Consensus pattern: [LIVM](2)-x-[AG]-C-T-[DEHHLIVMFY]-[PNGRS]-x-[LIVM] 

[2076] [ 1] Gallo K.A.. Knowles J.R., Biochemistry :^:3981 -3990(1 993). 

[2077] 836. (ATP-sulfurylase) 

ATP-sulfurylase 

[2078] This family consists of ATP-sulfurylase or sulfate adenylyltransf erase EC:2.7.7.4 some of which are part of a 
3S bifunctionat potypeptkie chain associated with adenosyl phosphosulphate (APS) kinase APS_kinase. Both enzymes 
are required for PAPS (phosphoadenosine-phosphosutfate) synthesis from inorganic sulphate [2]. ATP sulfurylase 
catalyses the synthesis of adenosine-phosphosulfate APS from ATP and inorgank: sulphate [1]. 

Number of members: 37 

40 

[2079] 

[1] Kurima K, Wlarman ML, Krishnan S, Domowicz M, Krueger RC Jr, Deyrup A, Schwartz NB; Medline: 98337975 
A member of a family of sulfate-activating enzymes causes murine brachymorphism' [published erratum appears 
45 in Proc Natl Acad Sci U S A 1998 Sep 29;95(20): 12071] Proc Natl Acad Sci U S A 1998;95:8681-8685. 

[2] Rosenthal E, Leustek T; Medline: 96096529 A multifunctional Urechis caupo protein, PAPS synthetase, has 
both ATP sulfurylase and APS kinase activtties.- Gene 1995:165:243-248. 

[2080] 837. (ATP-synt^F) 
SO ATP synthase (F/1 4-kDa) subunit 

[2081] This family includes 14-kDa subunit from vATPases [1 ]. whk^h is in the peripheral catalytk: part of the complex 
[2]. The family also includes apchaebacterial ATP synthase subunit F [3]. 

Number of members: 23 

55 

[2082] 

[1] Guo Y, Kaiser K, WIeczorek H» Dow JA; Medline: 96269411 The Drosophila melanogaster gene vhal4 encoding 
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a 14-kDa F-subunit of the vacuolar ATPase." Gene 1996;172:239-243. 

[2] Peng SB. Crider BR Tsai SJ, Xie XS. Stone DK; Medline: 9621 641 6 Identification of a 1 4-kDa subunit associated 
with the catalytic sector of clathrin<x>ated veside H+-ATPase/ J Biol Chem 1996;271:3324-3327. 
[3] Wilms R. Freiberg C, Wegerte E, Meier 1, Mayer F. Muller V; Medline: 96324968 Subunit structure and organ- 
5 ization of the genes of the A1 AO ATPase from the Archaeon Methanosarcina mazei Got.' J Biol Chem 1 996;271 : 

18843-18852. 

[2083] 838. (CBD_4) 
Starch binding domain 

Number of members: 48 

[2084] 839. (CbiX) 

[2085] The function of CbiX is uncertain, however it is found in cobalamin biosynthesis operons and so may have a 
IS related function. Some CbiX proteins contain a striking histkiine-rich regk)n at their C-terminus, which suggests that it 
might be Involved in metal chelation [1]. 

Number of members: 6 

20 [2086] [1] Flaux E, Lanois A. Wfarren fWU, Rambach A, Thermes C; Medline: 98416126 Cobalamin (vitamin B12) 
biosynthesis: kientifcation and characterization of a Bacillus megaterium cobi operon.' Bkx:hem J 1 998;335: 1 59-1 66. 

840. (Complex1_51K) 

2S [2087] Respiratory-chain NADH dehydrogenase 51 Kd subunit signatures Cross-reference(s) PS00644; 
COMPLEX1_51K_1 PS00645; CC»MPLEX1_51K_2 

[2088] Respiratory-chain NADH dehydrogenase (EC 1 .6.5.3) [1 ,2] (also known as complex I or NADH-ubquinone 
oxidoreductase) is an oligomers enzymatic complex located in the Inner mitochondrial membrane which also seems 
to exist in the chloroplast and in cyanobacteria (as a NADH-plastoquinone oxkioreductase). Among the 25 to 30 
30 polypeptide subunits of this bioenergetic enzyme complex there is one with a molecular weight of 51 Kd (in mammals), 
which is the second largest subunit of complex I and is a component of the iron-sulfur (IP) fragment of the enzyme. It 
seems to bind to NAD, FMN, and a 2Fe-2S cluster. 
P089] The 51 Kd subunit is highly similar to [3,4]: 

3S - Subunit alpha of Ak^aligenes eutrophus NAD-reducing hydrogenase (gene hoxF) which also binds to NAD, FMN, 
and a 2Fe-2S cluster. 

Subunit UOO^ oH Paracoccus denitrtficans NADH-ubiquinone oxidoreductase. 
- Subunit F of Escheochia cbti NADH-ubk)uinone oxkioreductase (gene nuoF). 

40 [2090] The 51 Kd subunit and the bacterial hydrogenase alpha subunit contains three regbns of sequence similar- 
ities. The first one nrK>st probably corresponds to the NAD-binding site, the second to the FMN-binding site, and the 
third one, whk:h contains three cysteines, to the iron-sulfur binding region. Signature patterns have been devek)ped 
for the FMN-binding and for the 2Fe-2S binding regkxis. 

[2091] Consensus pattern: G-IAM]-G-[AR]-Y-{UVM]-C-G-IDE](2)-ISTA](2)-[LIM](2)-[EN]- S 
45 Consensus pattern: E-S-C-G-x-C-x-P-C-R-x-G [The three C's are putative 2Fe-2S ligands] 

[ 1] Ragan C.I.. Curr. Top. BkDenerg. 15:1-36(1987). 

[ 2] Weiss H.. Friedrich T. Hofhaus G., Preis D., Eur. J. Biochem. 197:563-576(1991). 
[ 3] Fearnley I.M.. Walker J.E. Biochim. Biophys. Acta 1140:105-134(1992). 
so [4] Wekiner U.. Geier S., Ptock A.. Friedrbh T, Lerf H.. Weiss H.. J. Mol. Biol. 233:109-122(1993). 

[2092] 841. (DAP.epimerase) 
Diaminopimelate epimerase signature 
[2093] Cross-reference{s) PS01 326; DAP.EPIMERASE 
55 Diaminopimelate epimerase (EC 5.1.1.7) catalyzes the isomeriazation of UL- to D.L-meso-diaminopimelate in the 
biosynthetk: pathway leading from aspartate to lysine. This enzyme is a protein of about 30 Kd. Two consen/ed cysteines 
seem [1] to functbn as the acki and base in the catalytk: mechanism. As a signature pattem, the regbn surrounding 
the first of these two active site cysteines were selected. 
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[2094] Consensus pattern: N-x-D-G-S-x(4)-C-G-N-(G A]-x-R [C is an active site residue] Sequences known to belong 
to this class detected by the pattern ALL, except for an Anabaena dapF which has a Ser instead of the active site Cys. 
[2095] [ 1] Cirilli M., Zheng R. Scapin G., Blanchard J.S.. Biochemistry 37:16452-16458(1998). 
[2096] 842. (DNA _gyraseB_C) 
DNA topoisomerase I) signature 

[2097] Cross-reference(s) PS001 77; TOPOISOMERASEJI 

DNA topoisomerase I (EC 5.99.1.2) [1,2,3,4,E1] is one of the two types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type II topoisomerases are ATR-dependent and ad by passing a DNA segment through 
a transient double-strand break. Topoisomerase II is found in phages, archaet>acteria, prokaryotes, eukaryotes, and 
in African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits (the product of 
genes 39, 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyrA and gyrB [E2]). In some bacteria, a second type II topoisomerase has been identified; It is known as 
topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE). 
In eukaryotes, type II topoisomerase is a homodimer. 

[2098] There are many regions of sequence homobgy between the different subtypes of topoisomerase II. The 
relation between the different subunits is shown in the folbwing representatk>n: 



< AbouM400-residues > 

[ Protein 39-* ][ — Protein 52 — ] Phage T4 

[ gyrB ♦ ][ gyrA ] Prokaryotc II 

Archaebacteria 

[ parE ♦ ][ parD ] Prokaryote IV 

[ * ] Eukaryote and 

ASF 

Position of the pattern. 

[2099] As a signature pattern for this family of proteins, a regbn that contains a highly consented pentapeptkJe was 
selected. The pattem is kxated in gyrB, in parE. and in protein 39 of phage T4 topoisomerase. 
[2100] Consensus pattem: [LIVMA>x-E-G-(DN]-S-A-x-[STAG] 

[ 1] Stemglanz R., Curr. Opin. Ceil BioL 1:533^535(1990). 

[ 2] Bjornsti M.-A., Curr. Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A., Mondragon A., Curr. Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Roca J.. Trends Biochem. Sci. 2a 156-1 60(1 995). 

[2101] 843. (DUF16) 
Protein of unknown function 

[2102] The functkxi of this protein is unknown. It appears to only occur in Mycoplasma pneumoniae. 
Number of members: 26 

[2103] [1] Himmelreich R, Hilbert H, Plagens H, PirkI E, Li BC, Herrmann R; Medline: 97105885 Complete sequence 
analysts of the genome of the bacterium Mycoplasma pneumoniae.' Nucleic Acids Res 1996;24:4420-4449. 
[2104] 844, (DUF21) 
[21 05] Domain of unkrw>wn function 

[2106] This transmembrane region has no known functkxi. Many of the sequences in this family are annotated as 
hemolysins, however this is due to a similarity to Swiss:Q54318 that does not contain this domain. This domain is 
found in the N-terminus of the proteir^s adjacent to two intracellular CBS domains CBS. 
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Number of members: 42 

[2107] 845. (DUF56) 
[2108] Integral membrane protein 
5 [2109] The members of this family are putative integral membrane proteins. The function of the family is unknown, 
howeverthefamily includes Sec59from yeast. Sec59 isadolichol kinase EC:2.7.1.108, but it is not clear if the enzymatic 
activrty reskles in this region or its N terminal region. 

Number of members: 1 3 

10 

[2110] 846. (DUF94) 

[2111] Domain of unknown function 

[2112] The function of this domain b unknown. It is found in both eukaryotes and archaebacteria. The alignment 
contains a completely consen/ed aspartate residue that may be f unctbnally Important. The eukaryotic domains contains 
IS three consented cysteines and a histUine that might be metal binding, however these are absent in the archaebacteria! 
proteins. 

Number of members: 9 

20 [2113] 847. (FF) 
[2114] FF domain 

pi15] This domain may be involved in protein-protein interactkm [1]. 
Number of members: 42 

25 

[21161 [1] Bedford MX Leder P; Medline: 99322199 The FF domain: a novel motif that often accompanies WW 
domains." Trends Bkschem Sci 1999;24:264-265. 
[2117] 848. (FLO^LFY) 
Floricauta / Leafy protein 

30 [2118] This family consists of various plant devetopment proteins whk:h are homologues of floricaula (FLO) and Leafy 
(LFY) proteins whk:h are floral meristem identity proteins. Mutations in the sequences of these proteins affect flower 
and leaf developnnent. 

Number of members: 16 

ss 

[2119] 

[1] Hofer J. Turner L. Hellens R, Ambrose M, Matthews P. Mk:hael A, Ellis N; Medline: 97411151 UNIFOLIATA 
regulates leaf and flower nnorphogenests in pea.' Curr Biol 1997;7:581-587. 
40 [2] Weigel D, Alvarez J, Smyth DR, Yanofsky MF. Meyerowitz EM; Medline: 92274452 LEAFY controls floral mer- 

istem identity in Arabkiopsis.* Cell 1992;69:84d-859. 

[2120] 849. (G-patch) 
G-patch domain 

^ [2121] This domain is found in a number of RNA binding proteins, and is also found in proteins that contain RNA 
binding domains. This suggests that this domain may have an RNA binding function. This domain has seven highly 
consented glycines. 

Number of members: 47 

so 

[2122] [1 ] Aravind L, Koonin EV^ Medline: 10470032 G-patch: a new consented domain In eukaryotic RNA-processing 
prcrteins and type D retroviral polyproteins/ Trends Biochem Sci 1999;24:d42-344. 
[2123] 850. (Gram-ve_porins) 
General diflusbn Gram-negative porins signature 
ss [2124] Cross-reference(s) PS00576; GRAM_NEG„PORIN 

The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic compounds. Proteins, known 
as porins [1], are responsible for the 'molecular sieve* properties of the outer membrane. Porins form large water- filled 
channels whch albws the dlffuskxi of hydrophilic molecules into the periplasmic space. Some porins form general 
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diffusion channels that allows any solutes up to a certain size (that size is known as the exclusion limit) to cross the 
membrane, while other porins are specific for a solute and contain a binding site for that solute inside the pores (these 
are known as selective porins). As porrns are the major outer membrane proteins, they also serve as receptor sites 
for the binding <^ phages and bacterkx:ins. General dif!usk3n porins generally assemble as trimer in the membrane 
s and the transmembrane core of these proteins is composed exclusively of beta strands [2]. It has been shown [3] that 
a number of general porins are evolutionary related, these porins are: 

Enterotxacteria phoE. 
Enterobacteria ompG. 
10 - Enterobacteria ompF. 

Enterobacteria nmpC. 
Bacterk>phage PA-2 LC. 
Neisseria Pl.A. 
Neisseria Pl.B. 

IS 

[2125] As a signature pattern a consented regk>n was selected, located in the Oterminal part of these proteins, 
whk:h spans two putative transmembrane beta strands. 

[2126] Consensus pattern: (UVMFY]-x(2)-G-x(2)-Y-x-F-x-K-x(2)-[SNHSTAVJ4LIVMFYWl-V 

20 [1] Benz R., Bauer K., Eur. J. Biochem. 176:1-19(1988). 

[2] Jap B.K.. Walian RJ., Q. Rev. Biophys. 23:367-403(1990). 

[3] Jeanteur D., Lakey J.H., Pattus R. Mot. Mk^robiol. 5:2153-2164(1991). 

[2127] 851. (HlyD) 
2S HlyD family secretion proteins signature 

[2128] Cross-reference(s) PS00543; HLYD_FAMILY 

Gram-negative bacteria produce a number of proteins which are secreted Into the growth medium by a mechanism 
that does not require a cleaved N-termtnal signal sequence. These proteins, while having different functions, require 
the help of two or nrwre proteins for their secretkxi across the cell envelope. Amongst which a protein bebnging to the 
30 ABC transporters family (see the relevant entry <PDOC00185>) and a protein belonging to a family which is currently 
composed [1 to 5] of the following members: 



Gene 


Species 


Protein whk:h is exported 


hlyD 


Escherichia coti 


Hemolysin 


appD 


A. pteuropneumoniae 


Hemolysin 


IcnD 


Lactococcus lactis 


Lactococcin A 


IktD 


A.actinomyceterrK»mitans 


Pasteurella haemolytk^a Leukotoxin 


rtxD 


A.pleuropneumoniae 


Toxin-Ill 


cyaD 


Bordetelta pertussis 


Calmodulin-sensitive adenylate cycla8e-hemolysin(cyck>lysin 


cvaA 


Escherk:hia coli 


Colicin V 


prtE 


Enwinia chrysanthemi 


Extracellular proteases B and C 


aprE 


Pseudomonas aeruginosa 


Alkaline protease 


emrA 


Escherichia coli 


Drugs and toxins 


yjcR 


Escherichia coil 


Unknown 



These proteins are evolutbnary related and consist of from 390 to 480 amino ackj residues. They seem to be anchored 
in the Inner membrane by a N4erminal transmennbrane regkxi. Their exact role in the secretion process is rK>t yet 
so known. The C-terminal sectton of these proteins Is the best consented regkxi; a signature pattem from that region was 
derived. 

[2129] Consensus pattem: [UVM]-x(2)-G-tLM]-x(3)-(STGAV]-x4LIVMTl-x-[LIVI^4GE]-x-(KR]-x-[UV 
ILIVMFYW1{3) 

Sequences known to bekmg to this class detected by the pattem ALL, except for emrA and yjcR. 

ss 
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References: 
[2130] 

5 [1] Gilson L., Mahanty H.K., Kofter R, EMBO J. 9:3875-3884(1 990). 

[2] Letoffe S.. Delepelaire W!andersman C, EMBO J. 9:1375-1382(1990). 

[3] Stoddard G.W.. Petzel J.R, van Belkum M. J.. Kok J.. McKay L.L , Appl. Environ. Microbbl. 58: 1 952-1 961 (1 992). 
[4] Duong R, Lazdunski A.. CamI B., Murgier M., Gene 121:47-54(1992). 
[5] Lewis K., Trends Biochem. Sci. 19:119-123(1994). 

10 

[2131] 852. (tBR) 
In Between Ring fingers 

[2132] The IBR (In Between Ring fingers) domain is found to occur k>etween pairs of ring fingers (zf-C3HC4). The 
function of this domain is unknown. This domain has also been called the C6HC domain and DRIL (for double RING 
15 finger linked) domain [2]. 

Number of members: 25 

[2133] 

20 

[1) Morett E. Bork P; Medline: 10366851 A novel transactivatbn domain tn parktn.'Trends Biochem Sci 1999;24: 
229-231. 

[2] van der Reijden BA, Erpelinck-Verschueren CA, Lowenberg B, Jansen JH; Medline: 99349709 TRIADS: a new 
class of proteins with a novel cysteine-rich signature.* Protein Sci 1999;8:1557-1561. 

25 

[2134] 853. (IPPT) 
IPP transferase 

[1] Durand JM, Bjork GR, Kuwae A, Yoshikawa M, Sasakawa G; Medline: 97440126 The modified nucleoside 
30 2-methylthk>-N6-isopentenyladenosine in tRNA of Shigella flexneri Is required for expression of virulence genes. 

• J Bacterioi 1997;179:5777-5782. 

[2] Boguta M, Hunter LA, Shen WC. Glllman EC, Martin NC, Hopper AK; Medline: 94187700 Subcellular k)catlons 
of MODS proteins: mapping of sequences sufficient for targeting to mitochondria and demonstration that mito- 
chondrial and nuclear isoforms commingle in the cytosol/ Mol Cell Bk>\ 1994;14:2298-2306. 
35 [3] Gittman EC. Slusher LB. Martin NC. Hopper AK; Medline: 91 203856 MODS translatk>n initiation sites determine 

N6-isopentenyladenosine modrficatk)n of mitochondrial and cytoplasmk: tRNA.* Mol Cell Bk>l 1 991 ;1 1 :2382-2390. 

[2135] 854. (KE2) 
KE2 family protein 

40 [2136] The functkxi of members of this family is unknown, although they have been suggested to contain a DNA 
binding leucine zipper motif [2]. 

Number of members: 9 

45 [2137] 

[1 ] Ha H, Abe K, Artzt K; Medline: 920841 31 Primaiy structure of the embryo-expressed gene KE2 from the mouse 
H-2K region.' Gene 1991;107:345-346. 

[2] Shang HS, Wong SM, Tan HM, Wu M; Medline: 95129859 YKE2. a yeast nuclear gene encoding a protein 
so showing homology to mouse KE2 and containing a putative leucine-zipper motif." Gene 1994;151:197-201. 

[213q 855. (Upoprotein.6) 
Prokaryotk: membrane lipoprotein lipkJ attachment site 
[21 39] Cross-reference(s) PS0001 3; PROKAR_UPOPROTEIN 
S5 In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific 
lipoprotein signal peptkiase (signal peptidase II). The peptkiase recognizes a conserved sequence and cuts upstream 
of a cysteine reskJue to whk:h a glycerkle4atty acki llpki is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1 ,2.3]): 
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Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). 

Escherichia coll lipoproteln-28 (gene nlpA). 

Escherichia coli tipoprotein-34 (gene nIpB). 

Escherichia coli lipoprotein nIpC. 
s - Escherichia coli lipoprotein nIpD. 

Escherichia coli osmotically inducible lipoprotein B (gene osmB), 

Escherichia coli osrrK>tically inducible lipoprotein E (gene osmE). 

Escherichia coll peptldoglycan-associated lipoprotein (gene pal). 

Escherichia coll rare lipoproteins A and B (genes rplA and rplB). 
10 - Escherichia coli copper homeostasis protein cutF (or nIpE). 

Escherichia coli ptasmids traT proteins. 

Escherichia coli Col ptasmids lysis proteins. 

A number of Bacillus beta-lactamases. 

Bacillus subtilis periplasmtc oligopeptide-binding protein (gene oppA). 
IS - Borrella burgdorferi outer surface proteins A and B (genes ospA and ospB). 

- Borrella hermsll variable major protein 21 (gene vmp21 ) and 7 (gene vmp7). 
Chlamydia trachomatis outer membrane protein 3 (gene omp3). 
Fibrobacter succlnogenes endoglucanase cel-3. 

Haemophilus Influenzae proteins Pal and Pep. 
20 . Klebsiella pullulunase (gene puIA). 

Klebsiella pullulunase secretion protein pulS. 
Mycoplasma hyorhinis protein p37. 

Mycoplasma hyorhinis variant surface antigens A, B. and C (genes vIpABC). 
Neisseria outer membrane protein H.8. 
2S . Pseudomonas aeruginosa lipopeptide (gene IppL). 
Pseudomortas solanacearum endoglucanase egl. 

Rhodopseudomonas viridls reaction center cytochrome subunit (gene cytC). 
Rickettsia 1 7 Kd antigen. 

Shigella flexneri invasion plasm id proteins mxiJ and mxIM. 
30 - streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 
Treponema pallidium 34 Kd antigen. 

- Treponema pallidium membrane protein A (gene tmpA). 
Vibrio harveyi chrtobiase (gene chb). 

Yersinia virulence plasmid protein yscJ. 
3S - Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper43lnding protein. This Is the first 
archaebacterial protein known to be modified in such a fashion). 

[2140] From the precursor sequences of all these proteins, a consensus pattern and a set of rules to Identify this 
type of post-translatbnal modification were derived. 
40 [2141] Consensus pattern: {DERK}(6)-[UVMFWSTAG](2)-[UVMFYSTAGCQ]-[AGS]-C [C is the lipid attachment 
site] Additional rules: 1) 

[2142] The cysteine must be between posrtksns 15 and 35 of the sequence in conskieratlon. 2) There must be at 
least one Lys or one Arg in the first seven positions of the sequence. Sequences known to belong to this class detected 
by the pattem ALL. Other sequence(s) detected In SWISS-PROT some 100 prokaryotic proteins. Some of them are 
4S not membrane lipoproteins, but at least half of them could be. 

References 

[2143] 

so 

[1] Hayashi S., Wu H.C., J. Bwenerg. Bwmembr. 22:451-471(1990). 
[2] Klein P, Somorjal R.L., Lau RC.K., Protein Eng. 2:15-20(1988). 
[3] von Heijne G., Protein Eng. 2:531-534(1989). 

[4] Mattar S.. Scharf B., Kent S.B.H.. Rodewaki K., Oesterhelt D.. Engelhard M. J. Bbl. Chem. 269:14939-14945 
ss (1994). 

[2144] 856. (Upoprotein^7) 
Adhesin lipoprotein 
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[2145] This family consists of the p50 and variable adherence^ssociated antigen (Vaa) adhesins from h4ycoplasma 
homtnis. M. hominis is a mycoplasma associated with human urogenital diseases, pneumonia, and septic arthritis [1]. 
An adhesin is a cell surface molecule that mediates adhesion to other cells or to the surrounding surface or substrate. 
The Vaa antigen is a 50-kDa surface lipoprotein that has four tandem repetitive DNA sequences encoding a periodic 
s peptide structure, and is highly immunogenic in the human host [1]. p50 is also a 50-kDa lipopiotein. having three 
repeats A.B and C. that may be a tetramer of 191-kDa in its native environment [2]. 

Number of members: IB 

10 [2146] 

[1 ] Zhang Q. Wise KS; Medline: 96294788 Molecular basis <^ size and antigenic variatk>n of a Mycoplasma hominis 
adhesin encoded by divergent vaa genes. Infect Immun 1996;64:2737-2744. 

[2] Henrich B, KItzerow A, FeWmann RC, Schaal H, Hadding U; Medline: 97047675 Repetitive elements of the 
IS Mycoplasma hominis ac^esin p50 can be differentiated by monoclonal antibodies.' infect Immun 1996;64: 

4027-4034. 

[2147] 857. (MaoCJIke) 
MaoC like domain 

20 [21 48] The MaoC protein is found to share similarity with a wide variety of enzymes; estradbl 1 7 beta-dehydrogenase 
4, peroxisomal hydratase-dehydrogenase-epimerase, fatty ackJ synthase beta subunit. All these enzymes contain other 
domains. This domain is also present in the NodN nodulatkm protein N. No specific function has been assigned to this 
regk)n of any of these proteins. The maoC gene is part of a operon with maoA which is involved In the synthesis of 
monoamine oxidase [1]. 

2S 

Number of members: 46 

[2149] [1] Sugino H, Sasaki M, Azakami H, Yamashita M, Murooka Y Medline: 96235221 A monoamine-regulated 
Klebsiella aerogenes operon containing the monoamine oxidase structural gene (maoA) and the maoC gene." J Bac- 
30 terbl 1 992; 1 74:2485-2492. 
[2150] 858. (MSP) 

Manganese-stabilizing protein / photosystem II polypeptide 

[2151] This family consists of the 33 KDa photosystem II polypeptide from the oxygen evolving complex (OEC) of 
plants and cyanobacteria. The protein is also krKvwn as the manganese-stabilizing protein as it is associated with the 
3S nnanganese complex of the OEC and may provkie the ligands for the complex [1 ]. 

Numt>er of members: 17 

[2152] [1] Philbrnk JB, Zilinskas BA; Medline: 88334494 "Ckming, nucleotkle sequence and mutatbnal analysis of 
40 the gene encoding the Photosystem II manganese-stabilizing polypeptkle of Synechocystis 6803." Mol Gen Genet 

1988;212:418-425. 
[2153] 859. (NAC) 

[2154] [1] Makarova KS, Aravind L, Galperin MY. Grishin NV. Tatusov RL. Wolf Yl. Koonin Medline: 99342100 
Comparative genomk:s of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and 
4S the variable shell." Genome Res 1999;9:608-628. 

Number of members: 27 

[2155] 860. (Nop) 

SO Putative snoRNA binding domain 

[2156] This family consists of various Pre RNA processing ribonucleoprotelns. The functbn of the aligned region is 
unknown however it may be a common RNA or snoRNA or Nopip binding domain. Nop5p (Nop58p) Swiss:Q12499 
from yeast is the protein component of a ribonucleoprotein protein required for pre-18s rRNA processing and is sug- 
gested to functton with htopip in a snoRNA complex [1]. Nop56p Swiss:000567 and Nop5p Interact with Nopip and 

55 are required for ribosome biogenesis [2]. Prp31p Swiss:p49704 is required for pre-mRNA spicing in S. cerevisiae [3]. 
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Number of members: 23 
[21571 

5 [1] WU P. Brockenbrough JS, Metcalfe AC. Chen S. Aris JP; Medline: 98298165 Nop5p is a snr^ll nucleolar ribo- 

nucleoprotein component required for pre- 1 8 S rRNA processing in yeast.' J Biol Chem 1 998;273: 1 6453-1 6463. 
[2] GautierT, BergesT, Tollen^ey D, Hurt E;Medline: 6038777 Nucleolar KKE/D repeat proteins Nop56p and Nop58p 
interact wrth Nopip and are required for ribosome biogenesis." Mol Cell Biol 1997;17:7088-7098. 
[3] Weidenhammer EM, Singh M, Ruiz-Noriega M, Woolford JL Jr; Medline: 96184869 The PRP31 gene encodes 

10 a novel protein required for pre-mRNA splicing in Saccharomyces cerevlsiae.' Nucleic Acids Res 1996;24: 
1164-1170. 

[2150] 861. (Nramp) 

Natural resistance-associated nrtacrophage protein 

IS The natural resistance-associated macrophage protein (NRAMP) family consists of Nrampi , Nramp2, and yeast pro- 
teins Smf 1 and Smf2. The NRAMP family is a novel family of functional related proteins defined by a conserved hy- 
drophobic core of ten transmembrane domains [5]. This family of membrane proteins are divalent cation transporters. 
Nrampi is an integral membrane protein expressed exclusively in celts of the immune system and is recruited to the 
membrane of a phagosome upon phagocytosis [1]. By controlling divalent catbn concentrations Nrampi may regulate 

20 the interphagosomal replication of bacteria [1 ]. Mutations in Nrampi may genetically predispose an individual to sus- 
ceptibility to diseases including leprosy and tuberculosis conversely this might however provide protection form rheu- 
matoid arthritis [1]. Nramp2 is a multiple divalent cation transporter for Fe2-f, Mn2-i- and Zn2-i- amongst others it is 
expressed at high levels in the intestine; and is major transferrin-independent iron uptake system in mammals [1]. The 
yeast proteins Smf 1 and Smf2 may also transport divalent cations [3]. 

25 

Number of members: 36 
[2159] 

SO [1] Govoni G. Gros P; Medline: 98383996 Macrophage NRAMPI and its role in resistance to microbial infections. 
• Inflannm Res 1998;47:277-284. 

[2] Agranoff DD, Krishna S Medline: 98294035 Metal Ion homeostasis and intracellular parasitism." Mol Microbiol 

1998;28:403-412. 

[3] Pinner E. Gruenheid S, Raymond M. Gros P; Medline: 98030569 Functbnal complementation of the yeast 
3S divalent cation transporter family SMF by NRAMP2, a member of the mammalian natural resistance- associated 

macrophage protein family.* J Biol Chem 1997;272:28933-28938. 

[4] Collier M. Belouchi A, Gros P; Medline: 96402487 Resistance to intracellular infections: comparative genomic 
analysis of Nramp.* Trends Genet 1996;12:201-204. 

[5] Collier M. Prive G, Belouchi A. Kwan X Rodrigues V. Chia W. Gros P; Medline: 96036029 Nramp defines a 
40 family of membrane proteins/ Proc Natl Acad Sci U S A 1995:92:10089-10093. 

[2160] 862. (NTP_transf„2) 
Nucleotidyltransferiase domain 

Members of this family belong to a large family of nucleotidyltransferases [1]. 

45 

Numt>er of members: 83 

[2161] [1] Holm U Sander C; Medline: 96005605 DNA polymerase beta belongs to an ancient nucleotidyltransferase 
superfamily.* Trends Biochem Sd 1995;20:345-347. 
so [2162] 863. (Paramyxo^P) 

Paramyxovirus P phosphoprotein 

[2163] This family consists of paramyxovirus P phosphoprotein from sendai virus and human and bovine parainflu- 
enza viruses. The P protein is an essential part of the viral RNA polymerase complex formed form the P and L proteins 
[1]. The exact role of the P protein in this complex in unknown but it is involved in multiple protein-protein interactions 
ss and binding the polymerase complex to the nucleocapsid or ribonucleoprotein template [1]. It also appears to be im- 
portant for the proper folding of the L protein [1 ]. The paramyxoviruses have a negative sense ssRNA genome [1 ]. 
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Number of members: 15 
[21641 

5 [1] Bowman MC. Smallwood S, Moyer SA; Medline: 99329169 Dissection of Individual Functions of the Sendal 

Virus Phosphoprotein in Transcription." J Virol 1999;73:6474-6483. 

[2] Matsuoka Y. Curran J, Relet T, Kolakofsky D, Ray R. Compans RW; Medline: 91237868 The P gene of human 
parainfluenza virus type 1 encodes P and C proteins but not a cystelne-r ich V protein.' J Virol 1 991 ;65:3406-3410. 

10 [2165] 864. (Patatin) 

[2166] This family consists of various patatin glycoproteins from plants. The patatin protein accounts for up to 40% 
of the total soluble protein in potato tubers [2]. Patatin is a storage protein but it also has the enzymatic activity of lipid 
acyl hydrolase, catalysing the cleavage of fatty acids from membrane lipids [2]. 

IS Number of members: 21 

[2167] 

[1] BanfaM Z, Kostyal Z, Barta E; Medline: 95107249 Solanum brevidens possesses a non-sucrose-inducible 
20 patatin gene. " Mol Gen Genet 1 994;245:51 7-522. 

[2] Mignery GA, Pikaard CS. Park WD; Medline: 88226014 Molecular characterization of the patatin multigene 
family of potato.' Gene 1988;62:27-44. 

[2168] 865. (Pentapeptide_2) 
2s Pentapeptide repeats (8 copies) 

[2169] These repeats are found in many mycobacterial proteins. These repeats are most common in the PPE family 
of proteins, where they are found in the MPTR subfamily of PPE proteins. The function of these repeats is unknown. 
The repeat can be approximately described as XNXGX, where X can be any amino acki. These repeats are similar to 
Pentapeptkie [1], however it is not clear if these two families are structurally related. 

30 

Number of members: 362 
[217PI 

3S [1] Bateman A, Murzin A, Tek:hmann SA; Medline: 98318059 Structure arid distrtoutkxi of pentapeptkie repeats 

in bacteria.* Protein Sci 1998;7:1477-1480. 

[2] Cole ST. Brosch R, Parkhill J, Gamier T, Churcher G. Harris D, Gordon SV, Eiglmeier K, Gas S. Barry CE 3rd, 
Tekala F, Badcock K, Basham D, Brown D, Chillingworth T. Connor R, Davies R, Devlin K, Feltwell X Gentles S, 
Hamlin N, Holroyd S, Homsby T, Jagels K, Barrell BG; Medline: 98295987 Deciphering the bk>logy of Mycobac- 
40 terium tubercuk>sls from the complete genome sequence." Nature 1998;393:537-544. 

[2171] 866. (Peptidase_C13) 
Peptkiase 01 3 family 

This family of peptidases is known as the hemogtobinase family because it contains a gk)bin degrading enzyme from 
45 blood parasites Swiss:P42665. However relatives are found in plants and other organisms that have other functk)ns. 
Members of this family are asparaginyl peptidases [1]. 

Number of members: 26 

so [2172] [1] Chen JM, Dando PM, Rawlings ND. Brown MA, Young NE. Stevens RA, Hewitt E. Watts C. Barrett AJ; 
Medline: 97218252 Ckxiing, isolatbn, and characterizatkxi of mammalian legumain, an asparaginyl endopeptidase.' 
J Biol Chem 1997;272:8090-8098. 
[2173] 867. (Pro_dh) 
Proline dehydrogenase 

ss 

Number of members: 25 

[2174] [1] Ling M, Allen SW, Wood JM; Medline: 95055736 Sequence analysis kJentifies the proline dehydrogenase 
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and delta 1- pyrroline-5-carboxylate dehydrogenase domains of the muttifunctionat Escherichia coli PutA protein.' J 
Mo) Biol 1994:243:950-956. 
[2175] B68. (PsbP) 

[2176] This family consists of the 23 kDa subunit of oxygen evolving system of photosystem II or PsbP from various 
s plants (where it is encoded by the nuclear genome) and Cyanobacteria. The 23 KDa PsbP protein is required for PSIl 
to be fully operational in vivo, it increases the affinity of the water oxidation site for CI- and provides the conditioris 
required for high affinity binding of Ca2-i- [2]. 

Number of members: 25 

10 

[21771 

[1] Rova EM, Mc Ewen B, Fredriksson PO, Styring S; Medline: 97067138 Photoactivation and photoinhibition are 
competing in a mutant of Chlamydomonas reinhardtii lacking the 23-kDa extrinsic subunit of photosystem II." J 
IS Biol Chem 1 996;271 :2891 8-28924. 

[2] Kochhar A. Khurana JP. Tyagi AK; Medtm: 97191538 Nucleotide sequence of the psbP gene encoding pre- 
cursor of 23-kDgi^lypeptlde of oxygen-evolving complex In Arabktopsis thallana and its expressbn In the wild- 
type and a constitutlvety photomorphogenk: mutant." DNA Res 1996;3:277-285. 

20 [2178] 869. (PUA) 

[2179] The PUA domain named after PseudoUridine synthase and Archaeosine transglycosylase. was detected in 
archaeal and eukaryotic pseudouridtne synthases, archaeal archaeosine synthases, a family of predicted ATPases 
that may be involved in RNA modification, a family of predicted archaeal and bacterial rRNA methylases. Additbnally 
the PUA domain was detected in a family of eu karyotic proteins that also contain a domain homologous to the translation 
2S initiation factor elF1/SUI1 ; these proteins may comprise a novel type of translation factors. Unexpectedly, the PUA 
domain was detected also in bacterial and yeast glutamate kinases; this is compatible with the demonstrated role of 
these enzymes in the regulatbn of the expresskxi of other genes [1]. It is predk:ted that the PUA domain is an RNA 
binding domain. 

30 Number of members: 46 

[2180] [1] AravindL. KooninEV^ Medline: 991 931 78 Novel predicted RNA-binding domains associated with the trans- 
lation machinery.' J Mot Evol 1999;48:291-3Q2. 
[2181] 870. (RF1) 
3S eRFI -like proteins 

[2182] Members of this family are peptide chain release factors. The eukaryotic Release Factor 1 proteins (eRFI s) 
are involved in termination of translatkin. The eRFI protein is tunctksnal for all stop codons and appears to abolish 
read-through of these codons. This temily also includes other proteins for whch the precise molecular function is 
unknown. Many of them are from Archaebacteria These proteins may also be involved in translation termination but 
40 this awaits experimental verificatbn. 

Number of members: 25 

[2183] 

45 

[1] Frolova L, Le Goff X, Rasmussen HH, Cheperegin S, Dnigeon G, Kress M, Arman I, Haenni AL, Cells JE, 
Philippe M, etat; Medline: 95082951 A highly consented eukaryotk: protein family possessing properties of polypep- 
tkie chain release factor* [see comments] future 1994;372:701-703. 

[2] Drugeon G, Jean-Jean O, Frotova L. Le Goff X, Philippe M. Kisselev L, Haenni AL; Medline: 97315314 Eukary- 
so ox]c release factor 1 (eRF1) abolishes readthrough arKi competes with suppressor tRNAs at all three terminatkxi 

codons In messenger RNA.* Nucleic AckJs Res 1997;25:2254-2258. 

[2184] 871. (RibosomaLL14e) 
Ribosomal protein LI 4 
ss [2185] This family includes the eukaryotic ribosomal protein L14. 
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Number of members: 15 

[21861 872. (RibosomaLS27) 
Ribosomal protein S27a 

5 [2187] This family of ribosomal protetrts consists mainly of the 40S n*bosomal protein S27a which is synthesized as 
a C-terminal extension of ubiquitin (CEP). The S27a domain compromises the C4emiinal half of the protein. The 
synthesis of ribosomal proteins as extensions of ubiquitin promotes their incorporation into nascent ribosomes by a 
transient metabolic stabilization arKj is required for efficient ribosome biogenesis [3]. The ribosomal extension protein 
S27a contains a basic region that is proposed to form a zinc finger, its fusion gene is proposed as a mechanism to 

10 maintain a fixed ratio between ubiquitin necessary for degrading proteins and ritx)Somes a source of proteins [2]. 

Number of members: 36 

[2188] 873. (Spenmine__synth) 
IS Spermine/spermidine synthase 

[2169] Spermine and spermidine are polyamines. This family includes spermidine synthase that catalyses the fifth 
(last) step in the biosynthesis of spemttdine from argtnlne, and spermine synthase. 

Number of members: 39 

20 

[2190] 

[1] Mezquita J, Pau M, Mezqurta C; Medline: 97449308 Characterization and expression of two chicken cDNAs 
encoding ubiquitin fused to ribosomal proteins of 52 and 80 amino acids." Gene 1997;195:313-319. 
25 [2] Redman KL, Rechstelner M; Medline: 69181932 Identification of the long ubiquitin extension as ribosonnal 

protei- 8273." fMature 1989;338:438-440. 

[3] Finley D. Bartel B, N^rshavsky A; Medline: 89181925 The tails of ubiquitin precursors are ribosomal proteins 
whose fusbn to ubiquitin facilitates ribosome bk)genesls.* Nature 1989;338:394-401 . 

30 [2191] 874. (Surp)Surp module 

[1] Denhez F. Lafyatis R; Medline: 94266805 Consen/atk>n of regulated alternative spteing and Identification of func- 
tbnal domains In vertebrate homotogs to the Drosophlla splk:lng regulator, suppressor-of -white-apricot." J Biol Chem 
1994;269:16170-16179. 

[2192] This domain Is also known as the SWAP domain. SWAP stands for Suppressor-of-White-APrlcot. it has been 
3S suggested that these domains may be RN A binding [1 ]. 

Number of members: 32 

[2193] 875. (TFIIE)TFIIE alpha subunit 
40 The general transcriptkxi factor TFIIE has an essential role in eukaryotic transcriptbn initiation together with RNA 
polymerase 11 and other general factors. Human TFIIE consists of two subunits TFIIE-alpha Swiss:P29083 and TFIIE- 
beta Swlss:P29084 and joins the preinitiation complex after RNA polymerase II and TFIIF [1]. This family consists of 
the conserved amino terminal regksn of eukaryotk: TFIIE-alpha [2] and proteins from archaebacteria that are presumed 
to be TFIIE-alpha subunits also Swiss:029501 [3]. 

4S 

Number of members: 12 
[2194] 

SO [1] Ohkuma Y, Sumimc^o H, Hoffmann A, Shimasaki S, Horikoshi M, Boeder RG; Medline: 92065982 Structural 

motifs and potential sigma homok>gies in the large subunit of human general transcription factor TFIIE." Nature 
1991;354:398-401. 

[2] Ohkuma Y. Hashimoto S. Roeder RG. Horikoshi M; Medline: 93087200 Identification of two large subdomains 
In TFIIE-alpha on the basis of homotogy between Xenopus and human sequences. Nuciek: Acids Res 1992;20: 
SS 5838-5838. 

[3] Klenk HR Clayton RA, Tomb JF. White O. Nelson KE. Ketchum KA, Dodson RJ, G winn M. HIckey EK. Peterson 
JD, Rbhardson DL, Kerlavage AR. Graham DE. Kyrpldes NC, Flelschmann RD. Quackenbush J. Lee NH, Sutton 
GG. Gill S, Ktrkness EF, Dougherty BA, McKenney K, Adams MD. Loftus B. Venter JC, et at; Medline: 98049343 
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The complete genome sequence of the hyperthermophilrc. sulphate- reducing archaeon Archaeoglobus f ulgidus. 
■ Nature 1997;390:364-370. 

[2195] 876. Crransg!ut_core) 

5 [2196] Cross-reference(s) PS00547; TRANSGLUTAMINASES 

Transglutaminases (EC 2.3.2.1 3) (TGase) [1 .2] are calcium<Jependent enzymes that catalyze the cross-linking of pro- 
teins by pronrK>tingthe formation of isopeptide bonds between the gamma-cartx)xyl group of a glutamine in one polypep- 
tide chain and the epsilon-amino group of a lysine in a second polypeptide chain. TGases also catalyze the conjugation 
of poiyaminesto proteins. The best known transglutaminase is blood coagutatkxi factor Xlll, a plasma tetrameric protein 

10 composed of two catalytic A subunits and two non-catalytic B subunits. Factor XIII is responsible for cross-linking fibrin 
chains, thus stabilizing the fibrin clot. Other forms of transglutaminases are wkiely distributed in various organs, tissues 
and body flukis. Sequence data is available for the folk>wing forms of TGase: 

Transglutaminase K (Tgase K), a membrane-bound enzyme found in mammalian epklermis and important for the 
IS formatbn of the comified cell envek)pe (gene TGM1 ). 

Tissue transglutaminase (TGase C), a morK)merk: ubiquitous enzyme located in the cytoplasm (gene TGM2). 
- Transglutaminase 3, responsible for the later stages of cell envelope formation in the epkiemnis and the hair follicle 
(gene TGM3). 

Transglutamir^ase 4 (gene TGN44). 

20 

[2197] A conserved cysteine Is krtown to be involved in the catalytic mechanism of TGases. The erythrocyte mem- 
brane band 4.2 protein. whk:h probably plays an important role in regulating the shape of erythrocytes and their me- 
chanical properties, is evolutionary related to TGases. However the active site cysteine is substituted by an alanine 
and the 4.2 protein does not show TGase activity. 
2S [2198] Consensus pattem:[GT|-CKCA]-W-V-x-(SA]-[GA]-[lVT|-x(2)-T-x-[LMSC]-R-[CSA]-[LV]-G (The first C is the 
active site residue] Sequences known to belong to this class detected by the patternALL. Other sequence(s) detected 
in SWISS-PROTNONE. 

[2199] [ 1] Ichinose A., Bottenus R.E., Davie E.W. J. Bk)l. Chem. 265:13411-13414(1990). [ 2] Greenberg C.S., Birck- 
bichler P.J., Rk:e R.H. FASEB J. 5:3071-3077(1991). 
30 [2200] 877. (TruB_N)TruB family pseudouridylate synthase (N terminal domain) 

Members of this family are involved in modifying bases in RNA molecules. They carry out the conversbn of uracil 
bases to pseudourkiine. This family includes TruB, a pseudouridylate synthase that specifically converts uracil 55 to 
pseudouridine in most tRNAs. This family also includes Cbf5p that modifies rRNA [2]. 

3S Number of members: 33 

[2201] 

[1] Nurse K, Wrzeslnski J. Bakin A, Lane BG, Ofengand J; Medline: 96079944 Purification, cbning, and properties 
40 of the tRNA psi 55 synthase from Escherichia coli." RNA 1995;1:102-112. 

[2] Lafontaine DU. Bousquet-Antonelli C. Henry Y. Caizergues-Ferrer M. Tollervey D; Medline: 981 39521 The box 
H -f ACA snoRNAs canry Cbl5p, the putative rRNA pseudouridine synthase." Genes Dev 1998;12:527-537. 

[2202] 878. (UDPGP)UTP~glucose-1 -phosphate uridylyltransf erase 

45 This family consists of UTP-glucose-1 -phosphate uridylyttransferases, EC:2.7.7.9. Also known as UDP-glucose py- 
rophosphorylase (UDPGP) and Glucose-1 -phosphate uridylyltransferase. UTP-glucose-1 -phosphate urklylyltrans- 
ferase catalyses the interconversksn of MgUTP + glucose-1 -phosphate and UDP-glucose -¥ MgPPi [1]. UDP-glucose 
is an Important intermediate in marrunalian cartx>hydrate interconversion involved in various metabolic roles depending 
on tissue type [1]. In Dictyostelium (slime moki) mutants in this enzyme at>ort the development cycle [2]. Also within 

SO the family is UDP-N-acetylglucosamine Swiss:Q16222 or AGX1 [3] and two hypothetical proteins from Bonrella burg- 
dorferi the lyme disease spirochaete Swiss:051893 and Swiss:051036. 

Nuntber of members: 18 

55 [2203] 

[1] Duggleby RG. Chao YC. Huang JG, Peng HL, Chang HY; Medline: 96202932 Sequence differences between 
human muscle and liver cDNAs for UDPglucose pyrophosphorylase and kinetic properties of the recombinant 
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enzymes expressed in Escherichia coll." Eur J Biochem 1996;235:173-179. 

[2] Ragheb J A, Dottin RP; Medline: 87231 075 Structure and sequence of a UDP glucose pyrophosphorylase gene 
of Dk:tyostelium dtscoideum.* Nucleic Acids Res 1987;15:3891-3906. 

[3] Mio T. Yabe T, Arisawa M. Yamada-Okabe H; f^edline: 98269105 The eukaryotic UDP-N-acetylglucosamlne 
5 pyrophosphorylases. Gene cloning, protein expression, and catalytic mechanism. J Biol Chem 1998;273: 

14392-14397. 

[2204] 879. (UPF004) Uncharacterized protein family UPF0044 signal ureCross-reference(s) PS01 301 ; UPF0O44 
The following uncharacterized proteins have been shown [1 ] to be highlysimilar - Bacillus subtilis hypothetical protein 
10 yqel. 

- Escherichia coli hypothetical protein yhbV and H1 1333, the corresponding Haemophilus influenzae protein. 
Methanocoocus jannaschii hypothetical protein MJ0652. 

These are smalt proteins of 10 to 15 Kd. They can be picked up in the database by the following pattern. This pattern 
is located in the N-terminal part of these proteins. 

P205] Consensus pattern: L-[ST]-x(3)-K-x(3)-[KRHSGA]-x-{GA]-H-x-L-x-P-[LIV]-x(2)- [LIV]-[GA]-x(2)-G Sequenc- 
es known to belong to this class detected by the pattemALL Other sequence(s) detected in SWISS-PROTNONE. 
[2206] 880. (zf-A20)A20-like zinc fingerA20- (an inhibitor of cell deathlike zinc fingers. The zincfinger mediates self- 
association in A20. These fingers alsomediate IL-1-induced NF-kappa B activation. 

Numt>er of members: 22 

[2207] 

[1] Heyninck K. Beyaert R; Medline: 99126071 The cytokine-inducible zinc finger protein A20 inhibits IL-1 -induced 
NF- kappaB activatkxi at the level of TRAF6. FEBS Lett 1999;442:147-150. 

[2] De Vaick D, Heyninck K, Van Criekinge W, Ckjntreras R»Beyaert R, Fiers W; Medline: 96390831 A20, an inhibitor 
of cell death, self-assoc^tes by its zinc finger domain." FEBS Lett 1996;384:61-64. 

[3] Song HY, Rothe M, Goeddel DV; Medline: 96270609 The tunrx>r necrosis factor-inducible zinc finger protein 
A20 interacts with TRAF1/TRAF2 and inhibits NF-kappaB activation. Proc Natl Acad Sci U S A 1 996;93:6721 -6725. 
[4] Opipari AW Jr, Boguski MS, Dixit VM; Medline: 90368626 The A20 cDNA induced by tumor necrosis factor 
alpha encodes a novel type of zinc finger protein.' J Bk>l Chem 1990;265:14705-14708. 

[2208] 881. (zf-PARP) 

Poly(ADP-rikx>se) polymerase zinc finger domain 

Cro6s-r^erence(s) PS00347; PARP_ZN_FINGER_1 PS50064; PARP_ZN_FIN6ER_2 

[2209] Poly(ADP-ribose) polymerase (EC 2.4.2.30) (PARP) [1 ,2] is a eukaryotk; enzyme that catalyzes the covalent 
attachment of ADP-ribose units from NAD(+) to vark>us nuclear acceptor proteins. This post-translational modification 
of nuclear proteins is dependent on DNA. It appears to be Involved in the regulation of various important cellular proc- 
esses such as differentiation, proliferatnn and tumor transformatkxi as well as in the regulatkxi of the nriolecular events 
involved in the recovery of the cell from DNA damage. Structurally, PARP, about 1000 amino-ackJs residues long, 
consists of three distinct domains: an N-terminal zinc-dependent DNA-bindIng domain, a central automodification do- 
main and a C-tenminal NAD-binding domain. The DNA-binding region contains a pair of zinc finger domains which 
have been shown to bind DNA in a zinc-dependent manner. The zinc finger domains of PARP seem to bind specifically 
to single-stranded DNA. DNA ligase III [3] contains, in its N-terminal sectk)n, a single copy of a zinc finger highly similar 
to those of PARR 

[2210] Consensus pattern: C-(KR]-x-C-x(3)-l-x-K-x(3)-fRGJ-x(16,18)-W-[FYH)-H-x(2)-C [The three C's and the H are 
zinc ligands] Sequences known to belong to this class detected by the patternALL. Other sequence(s) detected in 
SWISS-PROTNONE. Sequences knovvn to belong to this class detected by the profile ALL Other sequence(s) detected 
in SWISS-PROTNONE, 

[2211] Note: This documentatk>n entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive than the pattems, you shouM use it if you have access to the necessary software tools to do so. 

ss [ 1] Atthaus F.R., Richter C.R. MoL Biol. Biochem. Btophys. 37:1-126(1987). 

( 2] de Murcia G., Menissier de Murcia J. Trends Bkx:hem. Sci. 19:172-176(1994). 

[ 3] Wei Y.-F., Robins P. Carter K.. CaWecott K., Pappin D.J.C., Yu G.-L. Wang R.-R, Shell B.K., Nash R.A.. Schar 
P, Barnes D.E., Haseltine W.A., Lindahl T Mol. Cell. Biol. 15:3206-3216(1995). 
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A. Asparaginase 2 

[2212] Asparaginase tl (L-asparagine aminohydrotase 11) is an extracellular protein that may be associated with the 
cell wall and whose expression is affected by the availability of nitrogen. Asparaginase II catalyzes the reaction of L- 
s Asparagine + HgO = L-Aspartate + NH3. As many leukemias have high requirements for aspartic acid, asparaginase 
II proteins are useful as reagents for screening compounds for activity as leukemia chemotherapy products. Aspara- 
ginase II protein can also be over- or under-expressed to alter amino acid content in plant tissues or to rrKxJrfy nitrogen 
fixation and/or nitrogen metabolism in plants. 

[221 3] Ref : Bon et al. (1 997) AppI Bk)chem Bbtechnol 63-65: 203-1 2 

10 

B. Chtoroa b-bind 

[2214] Chtorophyll a-b binding proteins are located in the thylakoid membranes of the chloroplast and bind chtorophyll 
a and chbrophyll b. thereby triggering a chemk:al reactk>n (photosynthesis). These proteins are useful in controlling 
IS the rate, efficiency and/or output of photosynthesis. Overexpression of chlorophyll a-b binding proteins ts expected to 
increase the rate photosynthesis. 

Ref: Leutwiler et al. (1 986) Nucleic Adds Res 1 4: 4051 -64 Brandt et al. (1 992) Plant Mol Biol 1 9: 699-703 
20 C. DMRL synthase 

[2215] DMRL Synthase (6,7-Dimethyl-8-Ribityllumazine Synthase) catalyzes the last step in riboflavin (Vitamin B2) 
synthesis, condensing 5-amino-6-(r-D)-ribttyl-amino-2,4(1H, 3H)-PyrimkJinedione with L-3,4-Dihydroxy-2-Butanone 
4-Phosphate producing 6.7-Dimethyl-8-(1-D-Ribityl)Luminazine. The enzyme forms a homopentamer. Engineering of 
2S these proteins or those with honrwiogous sequences/structures may altow control of the anrwunts of vitamin B2 available 
in plants and/or accumulation of pigment, as well as altering reactions requiring hydrogen Ion carriers/transmitters. 
Ref: Garcia-Ramirez et al. (1995) J Biol Chem 270: 23801-7 

D. E1 N 

30 

[2216] These proteins are ATP-dependent DNA helicases that are required for initiation of viral DNA replk:atbn. They 
form a complex with the viral E2 protein. The E1-E2 complex binds to the replication origin that contains binding sites 
for both proteins. The majority of sequences known for this group of proteins are from various papilbmavi ruses, a type 
of double stranded DNA virus. In plants, the prototype double stranded DNA virus is Cauliflower Mosab virus (CaMV). 
3S Manipulation of these proteins, especially to produce variant proteins that form non-productive complexes, enables 
productkxi of plants that are resistant to infection by double stranded DNA viruses. 

Ref. Yang et al. (1 993) PNAS USA 90: 5086-90 

Ustav and Stenlund (1 991 ) EMBO J 10: 449-57 
40 Callaway et al. (1996) Mol Plant Microbe Interact 9: 810€ 

E. EF1_G 

[2217] Elongatkxi Factor-1 is composed of four subunits: alpha, beta, delta and gamma. Gamma subunits are pre- 
45 sumed to play a role in anchoring the complex to other cellular components. Studies of EF-1 genes in plants suggests 
that different forms of the EF-1 subunits may be expressed in particular organs or in response to stress. Manipulation 
of the activity of these prc^eins, either by altered expressk)n level or by structural mutation, may result in the accumu- 
latkyi of a particular protein in a chosen organ or albw productkxi of partKular proteins during stress condltbns. 

so Ref. Kinzy et al. (1994) NAR 2^ 2703-7 Dunn el al. (1993) Plant Mol Biol 23: 221-5 Aguilar et al. (1991) Plant Mol 
Bwl 17: 351-60 

R ENV__polvprctein 

ss [2218] This family comprises the envetope or coat proteins known from a number of different retroviruses. In mam- 
malian species, retroviruses are responsible for diseases such as leukemia and HIV In plants, retroviruses are known 
in both monocct (e.g. Zeon-1 ) and dicot (e.g. Arabidopsis and tobacco) species and have been shown to induce mutant 
alleles at new loci. Engineering of plant ENV proteins may alk>w mobilizatk>n or targeting of endogenous or introduced 
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retroviruses, in essence generating a new method for mutant production, gene tagging and the like. 

Ret Mamoun et at ( 1 990) J Virol 64: 41 B0-& Grandbastien et al. (1 989) Nature 337: 376-80 Wright and Vbytas (1 998) 
Genetics 149: 703-15 

5 

G. Glycosvl hvdr9 

[2219] Proteins having this domain (previously known as the glycosyl hydrolase family 5 donnain) catalyze the en- 
dohydrolysis of 1 ,4-p-D-glucoskiic linkages in celluk)se. Numerous plant proteins with this domain exist and are ex- 
10 pressed in an organ specific manner. They are involved in the fruit ripening process, In cell ekmgation and plant re- 
productbn. Modulatbn of the activity of these proteins, either by over- or under-expresskxi or by mutatbn of the 
polypeptide, could be used to affect post-harvest physk>k>gy (e.g. rate of ripening) or for engineering reproductive 
sterility. 

IS Bet Gk>rda et al. (1 990) Bkx:hemistry 29: 7264-9 Tucker et al. (1 988) Plant Physiol 88:1257-62 ShanI et al. (1 997) 
43: 837-42 Milligan and Gasser (1995) Plant Mol Bbl 28: 691-711 

H. Glvcosvl_hvdr14 

20 [2220] The p-amylases (family 14 of glycosyl hydrolases) catalyze the hydrolysis of 1 ,4-a-glucosidic linkages in 
polysaccharides and remove successive nrtaltose units from the non-reducing ends of the chains. Mutants of p-amylase 
in Arabidopsis exhibited altered degradatton of starch throughout the diurnal cycle. In additton, the mutant phenotypes 
indicated that these enzymes not only affect carbohydrate metaboltsm/catabolism, but also influence the amount of 
pigment stored within partk:ular cells. Manipulation of the p-amytase genes enables control of plant pigmentatbn (for 

2S example, fibre pigment in cotton) as well as carbohydrate synthesis and degradation. 

Ref : Zeeman et al. (1 998) Plant J 1 5: 357-65 Hirano and Nakamura (1 997) Plant Physbl 114: 5675-82 Kitamoto et 
al. (1988) J Bacterk)! 170: 5848-54 

30 I. Glvco6vl_hvdr15 

[2221] Glycosyl hydrolases from family 15 (such as 1 ,4-Alpha-D-Glucan glucohydrolase,) catalyze the hydrolysis of 
terminal 1,4-tinked alpha-D-glucose residues successively from the non-reducing ends of the chains resulting In the 
release of p-D-Glucose. In plants these proteins have been tied to the mobilizatbn of the xyloglucan stored in the 
35 cotyledonary cell walls. Proteins such as these couki be varied to affect the rate of plant growth (for exanrple during 
germinaton), storage and/or use of glucose and other sugars by plant tissues and alteration of the properties, such 
as elasticity, of plant cell walls. 

Ref: Crombie et al. (1998) Plant J 15: 27-38 Hata et al. (1991) Agric Sk>\ Ghem 55: 941-9 

40 

J. GlvcosvLhvdr20 

[2222] Members of the family 20 glycosyl hydrolases catalyze the hydrolysis of terminal non-reducing N-acetly-D- 
hexosam'me reskiues in N-acetyl-P-D-hexosamintdes. N-acetyl-^-glucosaminidase belongs to this family and exists in 

45 several different forms (consisting of vark>us combir^tbns of alpha and beta chains) depending on the or^nism. 
Family 20 glycosyl hydrolases have been implicated in lysosomal storage diseases (such as Sandhoff disease) and 
glycogen storage disease in humans. These types of proteins are also responsible for the hydrolysis of chitin. In plants, 
these proteins coukJ be useful In controlling carbohydrate catabolism. thereby influencing the amount of sugars avail- 
able for storage and^or use in other metabolk: pathways. In additk>n, it is possible that such proteins coM be used to 

so engineer an endogenous insect protection mechanism, e.g. by secretbn of a chitln-hydrolyzing compositbn by the 
plant. 

Ref: Graham et al (1 988) J Bk>l Ghem 263: 1 6823-9 aDowd et al. (1 988) Biochemistry 27: 521 6-26 
ss K. HMG box 

[2223] The HMG box is a novel type of DNA-bipding domain found in a diverse group of proteins. Numerous plant 
proteins contain this domain, such as the HMG1/2-tike proteins. The expressbn of some of these HMG proteins appears 
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to be regulated by ctrcadian rhythms and in a light dependent manner, occurring at higher levels in roots, for example 
and lower levels In light-grown tissues such as cotyledons. Generally. HMG proteins are thought to influence transcrip- 
tion regutation. In plants. HMGs are believed to have a role in maintaining patterns of circadian-regulated expression 
for other genes, suggesting that these proteins could be exploited to control growth and development. 

5 

Ref: Uudet et al. (1 993) Nucleic Acids Res 21 : 2493-501 Zheng et al. (1 993) Plant Mol Biol 23: 81 3-23 Grasser et 
al (1993) Plant Mol Biol 23: 619-25 

L IL2 

10 

[2224] lnterfeukin-2 (IL-2)is produced in mammals by T cells in response to antigenic or mitogenic stimulation and 
is crucial for proper regulation and functioning of the immune response. I L-2 is capable of stimulating B cells, monocytes, 
lymphokine-activated killer celts, natural killer cells and glioma cells. Plant extracts have also been shown to stimulate 
the immune system (for example, mistletoe therapy for human cancer). It is known that IL-2 is involved in feedback 
IS inhibltk>n pathways that impact the inflammatory response as well as the growth inhibttkxi of tumor reactive T cells. 
Plant proteins containing IL-2-tike sequences are useful as immunity-based therapeutics, acting in a manner simitar 
to IL-2 in mammals. 

Ref: Heike et al. (1997) Scand J Immunol 45: 221-6 Ariel et at. (1998) J Immunol 161: 2465-72 Schink (1997) An- 
20 tnancer Drugs 8 Supp1 1 : S47-51 

M. Oxidored FMN 

[2225] NADPH dehydrogenases catalyze the reactk)n NADPH + acceptor = NADP(+) + reduced acceptor. One mem- 
2S ber of this family is yeast oW yellow enzyme' (OYE) and is thought to be involved in oxylipin metabolism. A second 
yeast family member is a protein that binds estrogen binding protein (EBP) in addition to exhibiting oxidoreductase 
activity. An Arabdopsis homotog to OYE has been described and estrogen binding proteins in plants have been re- 
ported. Plant proteins from this class have the potential to be used to modify lipid metabolism/catabolism. These pro- 
teins may also have use as therapeutics for breast and prostate cancer, and other abnormal growth in steroid-sensitive 
30 tissues. 

Ref: Baker et al. (1 998) Proc Soc Exp Bk>l Med 217: 317-21 Schaller and Waller (1 997) J Biol Chem 272: 28066-72 
Mandani et al. (1 994) PNAS USA 91 : 922-6 

35 N. Oxktored q2 

[222SI The NADH-plastoquinone (»ckloreductases catalyze the reaction NADH 4> plastoquinone = NAD(+) -i- plasto- 
quinol. In plants these reactions occur in the chloroplast and are believed to partk:ipate in a chbroplast respiratory 
system. Here, the NDH complex is postulated to act as a valve to remove excess reduction equivalents in the chloro- 
^ plasts. Manipulatkxi of these proteins may irnprove the rate or efficiency of photosynthesis. 

Ret Burrows et al. (1998) EMBO J 17: 868-76 Kofer et al (1998) Mol Gen Genet 258: 166-73 Maier et al. (1995) J 
Mol Biol 251: 614-28 

45 O. PABP 

[2227] Polyadenylate binding proteins bind the poly (A) tail of mRNA. Plants, as exemplified by Arabidopsis, contain 
numerous PABP genes that are expressed in an organ-specifk: manner. For example. PABP2 is functional in roots and 
shoots, while PABP5 is expressed predominantly in immature flowers. The PABP proteins are implicated in numerous 
so aspects of posttranscriptional regulatk)n including mRNA turnover and transitional initiation. Control of activity of PABP 
proteins provides the ability to control the expression of vark>us genes in particular organs during development. 

Ref: Hilson et al (1 993) Plant Physk>l 103: 525-33 Bebstotsky and Meagher (1 993) PNAS USA 90: 6686-90 

55 P. Pan^o coat 

[222S] Parvoviruses are linear single-stranded DNA viruses that are encapsulated by three capsid proteins. Plants 
are susceptible to infectbn by single stranded DNA viruses such as Maize streak virus (MSV) and vanous Gemini 
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viruses. The coat proteins In these plant viruses are critical to the virus life cycle within the plant. For example, the coat 
protein of MSV is thought to be involved in intra- and Inter-cellular movement within the plant Engineering of proteins 
having similarity to parvoviral coat proteins, especially to produce proteins that interfere with maturation of the virus 
particle, enables the production of plants having better resistance to natural plant single-stranded DNA viruses. 

5 

Ref: Uu et at. (1997) J Gen Virol 78: 1265-70 Rohde et al. (1990) Virology 176: 648-51 
Q. Pklnase C 

10 [2229] Plant serine/threonine protein kinases possessing this domain are expressed in all tissues and are known to 
undergo serlne-speclfic autophosphorylation and specifically phosphorylate two ribosomat proteins, PI 4 and PI 6. Dur- 
ing development, these proteins predominate during high metabolic activity in growing buds, root tips, leaf rr^rgins 
and germinating seeds. They are thought to be involved in the control of plant growth and development. In acklitlon, 
two genes encoding proteins from this family have been described that help plant celts adapt during cold or high salt 

IS stresses. Consequently, engineering Pkinase C proteins provides a way to control general growthAievelopment of the 
plant as well as a means to provide endogenous protection against environmental stresses. 

Ref: Zhang et al. (1994) J Bk>l Chem 269: 17586-92 Mizoguchi et al. (1995) FEBS Lett 358: 199-204 

20 R. REV 

[2230] The REV proteins act post-transcriptionally to relieve negative repression of GAG and ENV productkxi In 
retroviruses such as Human Immounodeficiency Virus type I (HlV-1). Plants contain retrovims-like vimses such as 
pararetroviruses and retrotransposons (i.e. transposons having tong terminal repeats). Plant retrotransposons in par- 
25 ticular have been used to create mutations at various loci, thereby permitting gene isolation, gene tagging and the like. 
Manipulatbn of plant REV proteins enables control of transposition frequencies of corresponding transposable ele- 
ments and provkies a new tool for genetic engineering of plants. 

Ref: Sodroski et al. (1986) Nature 321: 412-7 Franchini et al. (1989) PNAS USA 86: 2433-7 Marquet et al. (1995) 
30 77: 113-24 Grandbastlen et al. (1989) Nature 337: 376-80 Wright and Voytas (1998) Genetics 149: 703-15 

S. RuBisCo small 

[2231] Ribulose 1 ,5-bisphosphate carboxylase/oxygenase (RuBisCo) catalyzes the initial step in the C3 photosyn- 
3S thetic cartx>n reduction cycle, adding carbon dioxide to D-ribulose 1 ,5-bisphosphate to form two nrK>lecules of 3-phos- 
pho-D-glycerate. RuBisCo is comprised of two subunits, one large which Is synthesized In the chk>roplast, and one 
small which is synthesized in the cytoplasm and then transported in to the chk>roplast. The expressbn of the small 
subunit of RuBisCo is light regulated. Manipulation of these proteins couki increase the efficiency of photosynthesis 
or allow alterations in developmental timing. 

40 

Ref: Giuliano et al. (1988) PNAS USA 85: 7089-93 Dedonder et al. (1993) Plant Physiol 101: 801^ 
T Sialyltransf 

45 [2232] Memt>ers of the CMP-N-acetylneuraminate-P-galactosamide-a-2,3-sialyltransferase family catalyze the fol- 
bwing reactk>n: 

CMP-N-acetylneuramlnate + p-D-galactosyl-l^d-N-acetyl-a-D-galactosaminyl-R = CMP + a-N-acetylneraminyl-2,3-p- 
D-galactosyl-1 ,3-N-acetyl-alpha-D-galactoGamlnyl-R. These proteins are though to be responsible for the synthesis of 
the sequence neurac-a-2,3-gat-p-1,3-galnac- found on sugar chains )-linked to threonine or serine and also as a ter- 
50 minal sequence on certain gangliosides in mcimmallan cells. In plants, gtycosyttransf erases in the Golgi apparatus 
synthesize cell wall polysaccharides and elaborate the complex glycans of glycoproteins. Engineering of plant sialyl- 
transferases allows .targeting of proteins to partk:ular cellular kxatbns or enables the making of changes In cell wall 
structure. 

ss Ref: Wee et al. (1998) Plant Cell 10: 1759-68 Lee et al. (1994) J Biol Chem 269: 10028-33 Kitagawa and Paulson 
(1994) J Bk>l Chem 269: 1394-401 
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U. Signal 

[2233] Many plant proteins in thts family contain sequences similar to those found in both conrtponents of the prokary- 
otlc family of signal transducers known as the two-component systems. This suggests that activatbn may require a 
transfer of a phosphate group between the transmitter domain and the receiver domain. One family member in Arabl- 
dopsis appears to be involved in ethylene (a plant hormone) signal transduction. Other proteins in this family appear 
to be Involved in the regulatkxi of gene transcription under conditior^s of environmental stress. Signal proteins can be 
exploited to affect plant growth and development and/or control plant responses to stress conditions such as cold, 
nutrient availability, etc. 

Ref: Chang et al. (1 993) Science 262: 539-44 Nagaya et al. (1 993) Gene 1 31 : 11 9-1 24 Gottf ert et al. (1 990) PNAS 
USA 87: 2680-4 

V. vMSA 

[2234] vMSA proteins are major surface antigens presenting on the envetope of vark)us retroviruses. Surface anti- 
gens of retroviruses are often involved in tropism of the virus. Plants contain retrovirus-^ike viruses such as pararetro- 
vimses and retrotransposor^ (i.e. transposons having long terminal repeats). Plant retrotransposons in particular have 
been used to create mutants at various loci, thereby permitting gene Isolation, gene tagging and the like. Manipulation 
of plant vMS A proteins enables control of tropism of plant retroviruses that might be used for genetk: engineering tools, 
thus enabling targeting of the virus to particular species and/or tissues of plants. 

Ref: Okamoto et al. (1 988) J Gen Virol 69: 2575-83 Grandbastien et al. (1 989) Nature 337: 376-80 Wright and Vbytas 
(1998) Genetics 149: 703-15 

W. zf-CCCH 

[2235] This family of proteins is defined by having two CX(8)CX(5)CX(3)H-type zinc finger domains. These proteins 
cover a broad range of functions. For example, the COP1 protein acts as a repressor of photomorphogenesis In dark- 
ness; light stimuli abolish this suppressive actbn. In addition, COP1 protein can function as a negative transcript bnal 
regulator capable of direct interactk)n with components of the G-protein signaling pathway. As a second example, a 
zf-CCCH protein identified in Arabidopsis appears to be involved in the resistance to DNA damage induced by UV light 
and chemkal DNA-damaging agents. Overexpression of this class of proteins permits productton of plants that are 
better suited to adverse environments. Manipulatk>n of expression of zf-CCCH proteins f unct»ning as transcriptbnal 
regulators, such as COP1 , enables maniputatkxi of some signal transduction pathways. 

Ref: Pang et al. (1 993) Nuctek: Ackis Res 21 : 1 647-53 Deng et al. (1 992) Cell 71 : 791 -801 

X. zf-RanBP 

[2236] Proteins falling within this category contain many X-X-F-G and X-F-X-F-G repeats, and may contain 
RANBPI-like or PPIase domains. Plant proteins having domains similar to these include PAS1 and GMSTI. PAS1 has 
been shown to have dramatk: developmental affects that appear to be correlated with both cell divisbn and cell wall 
elongatkm. GMSTI has high Identity to the yeast STI stress-lnducible gene and has been shown to be heat inducible. 
Proteins such as these may be useful for controlling growth and form of devek)pment. 

Ref: Vittorioso et al. (1 998) Mol Cell Bbl 1 8: 3034-43 Hernandez Torres et al. (1 995) 27: 1 221 -6 

Y Peptktese M48. 

[2237] Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor and are kx^ted 
in the membranes of the endoplasmic reticulum. They function in NH^-termlnal proteolytb processing, as shown for 
the yeast STE24 gene product This gene is required for the correct processing of a-factor, a yeast pheromone. Family 
M48 peptidases also appear to be required for some prenylatlon reactk>ns, mediating COOH-termlnal CAAX process- 
ing. Prenylatbn reactions are believed to be Involved In the regulation of protein-protein and protein-membrane Inter- 
actk>ns. As an example, RAS GTPase activity is regulated in part by k>calization to the inner side of the plasma mem- 
brane upon prenylatbn. In plants, proteins from this family could be involved in pollen-stigma interactbns such as 
those mediating self-pollenation vs. outcrossing, or oouM be members of several secondary metabolism pathways. 
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Ref: Fujimura-Kamada et al. (1997) J Cell Biol. 136: 271-65. Tarn et al. (1998) J Cell Bbl. 142: 635-49. 
Z. DNA PdI Viral N 

s [2238] The DNA pol Viral N domain is located at the N-terminal region of DNA polymerase isolated from several 
retroid viruses such as the Cauliflower Mosaic Virus. The domain motif has also been found in numerous other species 
from humans to cyanobacterta. In these organisms, this motif seems to be associated with two types of sequences; 
retrotransposons and mitochondrial genes. In the mitochondrial sequences this domain is potentially involved in the 
setf-splicing conducted by group II introns. Various manipulations of this gene in plants allows control of the numerous 

10 retrotransposons endogenous to plant genomes or allows engineering of mrtochondriat function, especially to increase 
efficiency of energy utilization by cells. 

REF: Chapdelaine and Bonen (1 991 ) Cell 65: 465-72 Ferat and Miche (1 993) Nature 364: 358-61 Wilson et al. (1994) 
368: 32-8 Cambareri et al. (1994) 242: 658-65 Gaardner et at. (1981) NAR 9: 2871-2888 Cummings et al. 
IS (1990) Curr Genet 17: 375-402 Hattorl et al. (1986) Nature ^1: 625-8 

Aa. Galpatnjnhib 

[2239] This domain is found in calpastatin. an inhibitor protein specific for calpain. Calpain is a non-lysosomal calcium- 
20 dependent intracellular protease that appears to be involved in the dynamic changes of the cytoskeieton, especially 
actin-related structures, during earty Dtosophila embryogenesis [1 ]. Calpastatins co-exist in cells with calpains and the 
subcellular distribution of calpastatin Is thought to be important to calpain regulation [2]. In plants calpains and calp- 
astatins could be involved in embryogenesis and non-embryogenic organ reiteration. Mutations occurring in calpain 
inhibitor repeat domains would produce developmental abnormalities such as abnormal leaf, root or flower develop- 
25 ment. 

[2240] Refs 

1 Emori Y and Saigo K (1994) J Biol Chem 269: 25137-42. 

2 Mellgren RL. Lane RD. Mericle MT (1989) Biochim Biophys Acta 999: 71-77. 

30 

Ab. chorismate bind 

[2241] Chorisnnate binding domains are present in plant anthranitate synthase (AS) genes. AS genes catalyze the 
first step in the biosynthesis of tryptophan by converting chorismate and L-glutamine to anthranilate, pyruvate and L- 

35 glutamate. Some of these genes are involved in feedback inhibition by tryptophan [1] while some are feedback insen- 
sitive [2]. In Arabktopsis, two AS genes have overlapping, but different distributions. One of these AS genes is induced 
by wounding and bacterial pathogen infiltration [1]. Mutatk)ns in the chorismate binding domain would affect the pro- 
duction of tryptophan and could influence the plant's defense system. AS gene products can be used for in vtf/o syn- 
thesis of tryptophan and tryptophan derivatives. 

40 [2242] Refs 

1 NiyogI KK, Fink GR (1 992) Plant Cell 4: 721 -33. 

2 Song HS, Brotherton JE, GkXYzales RA, Wilholm JM (1998) Plant Physiol 117:533-43. 
45 Ac. late protein L2 

[2243] Papilkxnaviruses are encapsulated double stranded DNA viruses. Plants are susceptible to infection by dou- 
ble stranded DNA viruses such as Cauliflower Mosaic vinis (CaMV). The coat proteins in these plant viruses are critical 
to the virus life cycle wrthin the plant. For example, the coat protein of CaMV is thought to be involved in intra- and 
so inter-cellular nriovement wrthin the plant [1]. Engineering of proteins having similarity to papiltomavirus coat proteins 
may enable the production of plants having better resistance to natural plant double stranded DNA viruses. 
[2244] Refs 

1 Thompson SR, Mefeher U (1993) J Gen Virol 74: 1141-8. 
ss Ad. Peptkiase M41 

[2245] Proteins bek)nging to this peptklase family are metalloproteases that bind zinc as a cofactor and are integral 
membrane proteins. They seem to be involved in the degradatkxi of carboxy-terminal-tagged cytoplasmk: proteins. In 
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plants, these proteins are kx:ated in the thylakoid membranes of the chioroplasts, their expression Is light regulated 
and they are thought to be involved In degradation of soluble stromal proteins and tum-over of thylkold proteins [1]. 
Manipulation of expression and structure of these proteins would have effects on the efficiency of photosynthesis and 
the development of chbroplasts. 
5 [2246] Refs 

1 Undahl M, Tabak s, Cseke L. Pichersky E, Andersson Adam Z (1996) J Bk>l Chem 271 : 29329-34. 
Aa UPF0051 

10 [2247] There is some evidence that. In plants, proteins in this family are involved in ATP synthesis in chioroplasts 
[1 , 2]. Mutations in these proteins or altering their expressbn would affect the efficiency of photosynthesis and energy 
productkxi. 
[2246] Refs 

IS 1 Kostrzewa M, Zetsche K (1992) J Mol Biol 227: 961-70. 

2 Kostrzewa M, Zetsche K (1993) Plant Mol Bk>l 23: 67-76 

Af. E7 

20 [2249] Papilkxnaviruses are encapsulated double strarKled DNA viruses. The Papiltomavirus early protein 7 (E7) is 
known as a potent immortalizing and transforming agent. Transf ormatbn by E7 is thought to be mediated by the physical 
associatbn of E7 with cellular proteins regulating entry Into the cell cycle [1 ]. The result is entry into the cell cycle and 
suppression of terminal differentiation in mammalian cells. Thus, engineering of proteins having similarity to papilk>- 
mavirus E7 protein enables the production of plants having altered cellular proliferation characteristics and possibly 

2S altered nrK>rphology. For example, overexpresskxi of E7-like proteins wouU be expected to result In proliferation of 
cells of the tissue in which the E7 protein Is expressed, perhaps with suppression of differentiatbn events. Thus, for 
example, overexpression of E7-like proteins in meristem cells can result in taller plants and suppression of leafing and/ 
or flowering. 
[2250] Refs 

30 1 Zwerschke W. Jansen-Durr P Adv Cancer Res 2000;78:1-29 
Aq. Peptkiase U7 

[2251] This protein is known to be an integral membrane protein in the cyanobacterium Synechocystis where it 
3S f unctbns to digest cleaved signal peptkles [1 ]. This activity is necessary to maintain proper secretion of mature proteins 
across the membrane. In higher plants this protein may be present in the plastid or chk>roplast membranes where it 
woukl function by enabling protein movement into and out of the chioroplasts. Mutations In this protein woukJ be ex- 
pected to affect the development of plastkJs. Including chbroplasts, or alter the energy transfer system within the 
chbroplasts, thereby affecting growth and devek>pment. 
40 [2252] Refs 

1 Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E. Nakamura Y. Miyajima N. Hirosawa M, Sugiura M, Sasamoto S, 
Kimura T. Hosouchi T. Matsuno A, Muraki A. Nakazaki N. fMaruo K. Okumura S. Shimpo S. Takeuchi C. Wada T, 
Watanabe A, Yamada M. Yasuda M, Tabata S (1996) DNA Res 3:109-36. 

45 A. Activities of Polypeptides Comprising Signal Peptides 

[2253] Polypeptides comprising signal peptkJes are a family of proteins that are typically targeted to (1 ) a particular 
organelle or intracellular compartment, (2) interact with a particular molecule or (3) for secretion outside of a host cell. 
Example of polypeptkles comprising signal peptides include, without limitatkxi, secreted proteins, soluble proteins, 
SO receptors, proteins retained in the ER, etc. 

[2254] These proteins comprising signal peptkies are useful to nrxxiulate ligand-receptor interactions, cell-to-cell 
communlcatkxi. signal transductbn, intracellular communication, and activities and^r chemical cascades that take 
part in an organism outskie or within of any particular cell. 

[2255] One class of such proteins are soluble proteins whk:h are transported out of the cell. These proteins can act 
ss as ligands that bind to receptor to trigger signal transduction or to permit communication between cells. 

[2256] Another class is receptor proteins whbh also comprise a retentbn domain that lodges the receptor protein in 
the membrane when the cell transports the receptor to the surface of the cell. Like the soluble ligands, receptors can 
also modulate signal transduction and communk:atbn between cells. 
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[2257] In addition the signal peptide itself can serve as a llgand for some receptors. An example is the interaction of 
the ER targetnng signal peptide wrth the signal recognition particle (SRP). Here, the SRP binds to the signal peptide, 
hafting translation, and the resulting SRP complex then binds to docking proteins located on the surface of the ER. 
prompting transfer of the protein into the ER. 
s [2258] A description of signal peptide residue composition is descnbed below in Subsection IVC.1 . 

III. Methods of Modulating Polypeptide Production 

[2259] It is contemplated that polynucleotides of the invention can be incorporated into a host ceil or in-vitro system 
10 to modulate polypeptide production. For instarice, the SDFs prepared as described herein can be used to prepare 
expression cassettes useful in a number of techniques for suppressing or enhancing expression. 
[2260] An example are polynucleotides comprising sequences to be transcribed, such as coding sequences, of the 
present invention can be inserted into nucleic acid constructs to modulate polypeptide production. Typically, such se- 
quences to be transcribed are heterologous to at least one element of the nucleic acid construct to generate a chimeric 
IS gene or construct. 

[2261] Another example of useful polynucleotides are nucleic acid nrK>lecules comprising regulatory sequences of 
the present invention. Chimeric genes or constructs can be generated when the regulatory sequences of the invention 
linked to heterok>gous sequences in a vector construct. Within the scope of Invention are such chimeric gene andAor 

constructs. 

20 [2262] Also within the scope of the inventk>n are nucieb acid molecules, whereof at least a part or fragment of these 
DNA molecules are presented in REF AND SEQ TABLES 1 AND 2 of the present applbaton, and wherein the coding 
sequence is under the control of its own promoter and/or its own regulatory elements. Such molecules are useful for 
transforming the genome of a host cell or an organism regenerated from said host cell for nrxxfulattng polypeptkle 

production. 

2S [2263] Additk)nally, a vector capable of producing the oligonucteotkie can be inserted into the host cell to deliver the 
oligonucleotkJe. 

P264] More detailed description of components to be included in vector constructs are described both above and 
below. 

[2265] Whether the chimeric vectors or native nucleic acids are utilized, such polynucleotides can be incorporated 
30 into a host ceil to modulate polypeptkle productk)n. Native genes andAor nucleb acid molecules can be effective when 
exogenous to the host cell. 

[2266] Methods of modulating polypeptide expressk)n includes, without limitation: 
Suppresskxi methods, such as 

3S Antisense 
Ribozymes 
Co-suppression 

Insertkxi of Sequences into the Gene to be Modulated 
Regulatory Sequence Modulation. 

40 

as well as Methods for Enhancing Productbn. such as 

Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulatkm. 

45 

II I. A. Suppressky) 

[2267] Expressbn cassettes of the inventk>n can be used to suppress expressk>n of endogenous genes whk:h com- 
prise the SDF sequence. Inhibiting expressk)n can be useful, for instance, to taik)r the ripening characteristbs of a f njit 
so (Oeller et al.. Sconce 254:437 (1991)) or to Influence seed size (WO98/07842) or to provoke cell ablatbn (Mariani et 
al.. Nature 357: 384-387 (1992). 

[2268] As described above, a number of methods can be used to inhibit gene expression in plants, such as antisense, 
ribozyme. introductkxi of exogenous genes into a host cell, insertion of a polynucleotkJe sequence into the coding 
sequence and/or the promoter of the endogenous gene of interest, and the like. 

ss 

1 1 1 . A. 1. Antisense 

[2269] An expression cassette as descr&>ed above can be transformed into host cell or plant to produce an antisense 
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strand of RNA. For plant cells, antisense RNA inhibits gene expression by preventing the accumulation of mRN A which 
encodes the enzyme ol interest, see, e.g.. Sheehy et al., Proc, Nat Acad. Sci. USA. 85:8805 (1988), and Hiatt et al., 
U.S. Patent No. 4,801,340. 

5 III.A.2. RIbozymes 

[2270] Similarty, ribozyme constructs can be transformed into a plant to cleave mRNA and down-regulate translation. 
III. A.d. Co-Suppression 

10 

[2271] Another method of suppression is by introducing an exogenous copy of the gene to be suppressed. Introduc- 
tion of expression cassettes in which a nucleic acid is configured in the sense orientation with respect to the promoter 
has been shown to prevent the accumulation of mRNA A detailed description of this method is described above. 

IS III.A.4. Insertion of Sequences into the Gene to be Modulated 

[2272] Yet another means^of suppressing gene expression is to insert a polynucleotide into the gene of interest to 
disrupt transcription or translation of the gene. 

[2273] HonrK>logous recombination could be used to target a polynucleotide insert to a gene using the Cre-Lox system 
20 (A.C. Vergunst et al.. Nucleic Adds Hes. 26:2729 (1998), A.C. Vergunst et al.. Riant Mel. Bro/. 38:393 (1998), H, Albert 
etal.. Ptent J. 7:649 {^&9S)). 

[2274] In addition, random brisertion of polynucleotides into a host cell genome can also be used to disrupt the gene 
of interest. Azpiroz-Leehan et al.. Trends in Genefincs 13:152 (1997). In this method, screening for clones from a library 
containing random insertions is preferred for identifying those that have polynucleotides inserted into the gene of in- 
2S terest. Such screening can be performed using probes and/or primers described above based on sequences from REF 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The screening can also 
be performed by selecting clones or any transgenic plants having a desired phenotype. 

lli.A.5. Regulatory SequenceModulation 

30 

[2275] The SDFs described in REF and SEQ TABLES 1 and 2. and fragments thereof are examples of nucleotides 
of the invention that contain regulatory sequences that can be used to suppress or inactivate transcription andA>r 
translation from a gene of interest as discussed in I.C.5. 

35 III.A.6. Genes Comprising Dominant-Negative Mutations 

[2276] When suppression of production of the erKlogenous, native protein is desired it is often helpful to express a 
gene comprising a dominant negative mutation. Production of protein variants produced from genes comprising dom- 
inant negative mutations is a useful tool for research Genes comprising dominant negative mutations can produce a 

40 variant polypeptide which is capable of competing with the native polypeptide, but which does not produce the native 
result. Consequently, over expression of genes comprising these mutations can titrate out an undesired activity of the 
native protein. For example. The product from a gene comprising a dominant negative mutation of a receptor can be 
used to const rtutively activate or suppress a signal transduction cascade, allowing examination of the phenotype and 
thus the trait(s) controlled by that receptor and pathway. Alternatively, the protein arising from the gene comprising a 

4S dominant-negative mutation can be an inactive enzyme still capable of binding to the same substrate as the native 
protein and therefore competes with such native protein. 

[2277] Products from genes comprising dominant-negative mutations can also act upon the native protein itself to 
prevent activity. For example, the native protein may be active only as a homo-multrmer or as one subunit of a hetero- 
multimer. Incorporation of an inactive subunit into the multimer with native subunit(s) can inhibit activity. 
SO [2278] Thus, gene fururtion can be rruxiutated in host celts of interest by insertion into these cells vector constructs 
comprising a gene comprising a dominant-negative mutation. 

III.B. Enharw:ed Expression 

ss [2279] Enhanced expression of a gene of interest in a host cell can be accomplished by either (1) insertion of an 
exogenous gene; or (2) promoter modulation. 
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m.B.I. Insertion of an Exogenous Gene 

[2280] Insertion of an expression construct encoding an exogenous gene can boost the number of gene copies 
expressed in a host ceil. 

s [2281] Such expression constructs can comprise genes that either encode the native protein that is of interest or 
that encode a variant that exhibits enhanced activity as compared to the native protein. Such genes encoding proteins 
of interest can be constructed from the sequences from REF AND SEQ TABLES 1 AND 2, fragments thereof, and 
substantially similar sequence thereto. 

[2282] Such an exogenous gene can include either a constitutive promoter permitting expression in any cell in a host 
10 organism or a promoter that directs transcription only in particular cells or times during a host cell life cycle or in response 
to environmental stimuli. 

III. B.2. Fteoulatorv Sequence Modulation 

[2283] The SDFs of REF and SEQ TABLES 1 AND 2, and fragments thereof, contain regulatory sequences that can 
be used to enhance expression of a gene of interesL For example, some of these sequences contain useful enhancer 
elements. In some cases, duplication of enhancer elements or insertion of exogenous enhancer elements will increase 
expression of a desired gene from a particular promoter As other examples, all 11 promoters require binding of a 
regulatory protein to be activated, while some promoters may need a protein that signals a promoter binding protein 
^ to expose a polymerase birtding site. In either case, over-production of such proteirts can be used to enhance expres- 
sion of a gene of interest by increasing the activation time of the promoter. 

[2284] Such regulatory proteins are encoded by some of the sequences m REF AND SEQ TABLES 1 AND 2, frag- 
ments thereof, and substantially similar sequences thereto. 

[2285] Coding sequences for these proteins can be constructed as descrit>ed above. 

2S 

IV. Gene Constructs and Vtetor Construction 

[2286] To use isolated SDFs of the present invention or a combination of them or parts and/or mutants and/or fusions 
of said SDFs in the at)ove techniques, recombinant DNA vectors which comprise said SDFs and are suitable for trans- 
30 formation of cells, such as plant cells, are usually prepared. The SDF construct can be made using standard recom- 
binant DNA techniques (Sambrook et al. 1989) and can be introduced to the species of interest by Agrobacterium- 
mediated transformation or by other means of transformation (e.g., particle gun bombardment) as referenced bebw. 
[2287] The vector backbone can be any of those typical in the art such as plasmkte, viruses, artificial chromosomes, 
BAGS, YACs and PACs and vectors of the sort described by 

3S 

(a) BAC: Shizuya et al., Proc. I^tl. Acad. Sci. USA 89: 8794-8797 (1992); Hamilton et al., Proc. Natl. Acad. Sci. 
USA 93: 9975-9979 (1996); 

(b) YAC: Burke et al.. Science 236:806-812 (1987);. 

(c) PAC: Sternberg N. et al., Proc Natl Acad Sci U S A. Jan;87(1): 103-7 (1990); 

40 (d) Bacteria-Yeast Shuttle Vectors: Bradshaw et al., NucI Acids Res 23: 4850-4856 (1 995); 

(e) Lambda Phage Vectors: Replacement Vector, e.g., Frischauf et al., J. Moi Biol 1 70: 827-842 (1 983); or Insertion 
vector, e.g., Huynh et al., In: Glover NM (ed) DNA Cloning: A practical Approach, Vol.l Oxford: IRL Press (1985); 

(f) T-DNA gene fusion vectors: Wakien et al.. Mol Cell Bnl 1: 175-194 (1990); and 

(g) Plasmid vectors: Sambrook et al., infra. 

45 

[2288] Typk:ally, a vector will comprise the exogenous gene, whk:h in its turn comprises an SDF of the present 
inventkm to be introduced into the genome of a host cell, and which gene may be an antisense construct, a ribozyme 
construct chimeraplast, or a coding sequence with any desired transcriptkxial and/or translational regulatory sequenc- 
es, such as promoters, UTFts, and 3^ end terminatbn sequences. Vectors of the inventbn can also include origins of 

so replicatkxt, scaffoki attachment regior)s (SARs). markers, honrK>k)gous sequences, introns, etc. 

[2289] A DNA sequence coding for the desired polypeptkle, for example a cDNA sequence encoding a full length 
protein, will preferably be combined with transcriptional and translatkxial initiatbn regulatory sequences which will 
direct the transcriptkxi of the sequence from the gene in the intended tissues of the transformed plant. 
[2290] For example, for over-expression, a plant promoter fragment may be employed that will direct transcription 

ss of the gene in all tissues of a regenerated plant. Alternatively, the plant pronrwter may direct transcription of an SDF of 
the inventkx) in a specific tissue (tissuespecific prcNrrK>ters) or may be otherwise under more precise environmental 
control (inducible promoters). 

[2291] If proper polypeptide productionis desired, a polyadenylation regkxi at the 3'-end of the coding regkxi is typ- 
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tcalty included The polyadenylation region can be derived from the natural gene, from a variety of other plant genes, 
or from T-DNA. 

[2292] The vector comprising the sequences from genes or SDF or the invention may comprise a marker gene that 
confers a selectable phenotype on plant cells. The vector can include promoter and coding sequence, for instance. 
s For example, the narker may encode biockJe resistance, partrcularly antibiotic resistance, such as resistance to kan- 
amycin, G41 8, bleomycin, hygromyctn, or herbicide resistance, such as resistance to chlorosulf uron or phosphinotrbin. 

IV.A. Coding Sequences 

10 [2293] Generally, the sequence in the transformation vector and to be introduced into the genome of the host cell 
does not need to be absolutely kientical to an SDF of the present inventbn. Also, it is not necessary for it to be full 
length, relative to either the primary transcriptkxi product or fully processed mRNA. Furthermore, the introduced se- 
quence need not have the same intron or exon pattern as a native gene. Also, heterotogous non-coding segments can 
be incorporated into the coding sequence without changing the desired amino acid sequence of the polypeptide to be 

IS produced. 

IV.B. Promoters 

[2294] As explained above, introducing an exogenous SDF from the same species or an orthologous SDF from 
20 another species can modulate the expression of a native gene corresponding to that SDF of interest. Such an SDF 
construct can be under the control of either a constitutive promoter or a highly regulated inducible promoter {e.g., a 
copper inducible promoter). The promoter of interest can Initially be either endogenous or heterobgous to the species 
in questbn. When re-introduced into the genome of sakJ species, such promoter becomes exogenous to said species. 
Over*expressk>n of an SDF transgene can lead to co-suppression of the homologous endogeneous sequence thereby 
25 creating some alterations in the phenotypes of the transformed species as demonstrated by similar analysis of the 
chalcone synthase gene (Napoli et al.. Plant CeUg.Z79 (1990) and van der Krol et al., Plant Cell2:29^ (1990)). If an 
SDF is found to encode a protein with desirable characteristk:s. its over-productk>n can be controlled so that its accu- 
mulation can be manipulated In an organ- or tissue-specific manner utilizing a promoter having such specificity. 
[2295] Likewise, if the pronrK>ter of an SDF (or an SDF that Includes a promoter) is found to be tissue-specific or 
30 developmentally regulated, such a promoter can be utilized to drive or facilitate the transcription of a specific gene of 
interest {e.g., seed storage protein or root-specific protein). Thus, the level of accumulatkxi of a partk^ular protein can 
be manipulated or its spatial k)cali2atk>n In an organ- or tissue- specific manner can be altered. 

IV. C Signal PeptkJes 

ss 

[2296] SDFs of the present invention containing signal peptkies are indicated in the REF and SEQ TABLES. In some 
cases it may be desirable for the protein encoded by an introduced exogenous or orthologous SDF to be targeted (1 ) 
to a partKular organelle Intracellular compartment, (2) to interact with a particular molecule such as a membrane mol- 
ecule or (3) for secretbn outskie of the cell harboring the introduced SDF. This will be accomplished using a signal 
40 peptide. 

[2297] Signal peptides direct protein targeting, are involved in ligand-receptor interactions and act in celt to cell 
communteatkm. Many proteins, especially soluble prc^ins. contain a signal peptide that targets the protein to one of 
several different intracellular compartments. In plants, these compartments include, but are not limited to, the endo- 
plasmic reticulum (ER), mitochor)dria. plastlds (such as chkxoplasts), the vacuole, the Golgi apparatus, protein storage 

45 vessicles (PSV) and, in general, n>embranes. Some signal peptide sequences are conserved, such as the Asn-Pro- 
lle-Arg amino ackl motif found in the N-terminal propeptkie signal that targets proteins to the vacuole (Marty (1999) 
The Plant Cell^^ : 587-599). Other signal peptides do r)ot have a consensus sequence perse, but are largely composed 
of hydrophobk: amino acids, such as those signal peptkies targeting proteins to the ER (Vltale and Denecke (1999) 
The Plant Cell 1 1 : 61 5-628). Still others do not appear to contain either a consensus sequence or an identified common 

so secondary sequence, for instance the chloroplast stromal targeting signal peptides (Keegstra and Cllne (1999) The 
Plant Celin: 557-570). Furthenmore, some targeting peptides are bipartite, directing proteins first to an organelle and 
then to a membrane within the organelle (e.g. within the thytakoid lumen of the chbroplast; see Keegstra and Cline 
(1 999) The Plant Cell 1 1 : 557-570). In additbn to the diversity in sequence and secondary structure, placement of the 
signal peptkJe Is also varied. Proteins destined for the vacuole, for example, have targeting signal peptides found at 

ss the N-terminus, at the C-terminus and at a surface locatk)n in mature, fokjed proteins. Signal peptides also serve as 
ligands for some receptors. 

[2298] These characteristk:s of signal proteins can be used to more tightly control the phenotypic expression of 
introduced SDFs. In partcular, associating the appropriate signal sequence with a specific SDF can albw sequestering 
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of the protein in specific organelles (plastids. as an example), secretion outside of the cell, targeting interaction with 
particular receptors, etc. Hence, the inclusion of signal proteins in constructs involving the SDFs of the invention in- 
creases the range of nnanipulatlon of SDF phenotypic expression. The nucleotide sequence of the signal peptide can 
be isolated from characterized genes using common molecular biological techniques or can be synthesized in vitro. 
s [2299] In addition, the native signal peptide sequences, both amino acid and nucleotide, described in the REF and 
SEQ tables can be used to modulate polypeptide transport. Further variants of the native signal peptides described in 
the REF and SEQ tables are contemplated. Insertions, deletions, or substitutions can be made. Such variants will 
retain at least one of the functions of the native signal peptide as well as exhibiting some degree of sequence identity 
to the native sequence. 

10 [2300] Also, fragments of the signal peptides of the Invention are useful and can be fused with other sinal peptides 
of interest to modulate transport of a polypeptide. 

V. Transformation Techniques 

IS [2301] A wide range of techniques for inserting exogenous polynucleotides are Icnown for a number of host cells, 
including, without limitation, bacterial, yeast, mammalian, insect and plant cells. 

[2302] Techniques for transforming a wide variety of higher plant species are well known and described in the tech- 
nical and scientific literature. See, e.g. Weising et al., Ann. Rev. Genet ^:421 (1988); and Chrlstou, Euphytica. v 85, 
n. 1-3: 13-27. (1995). 

20 [2303] DNA constructs of the invention may be introduced into the genome of the desired plant host by a variety of 
conventional techniques. For example, the DNA constmct may be introduced directly into the genomic DNA of the 
plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs 
can t>e introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment. Alternatively 
the DNA constructs may be combined with suitable TDNA flanking regions and introduced into a conventional Agro- 

2S bacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host will direct the in- 
sertion of the construct and adjacent marker into the plant cell DNA when the cell is infected by the bacteria (McComnac 
et al.. MoL Biotechnol. B:^99 (1997); Hamilton, Gene 200:107 (1997)); Salomon et al. EMBO J. 3:141 (1984); Herrera- 
Estrella et al. EMBO J. 2:987 (1 983). 

[2304] Microinjectkxi techniques are known in the art and well described in the scientific and patent literature. The 
30 introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowski et al. EMBO J. 3: 

2717 (1984). Electroporation techniques are described in Fromm et al. Proc. Natl Acad. Sci. USA 82:5824 (1985). 

Ballistk: transformation technques are described in Klein et al. Nature ^[,773 (1987). Agrobacterium tumefaciens 

mediated transformation techniques, including disarming and use of binary or cointegrate vectors, are welt described 

in the scientific literature. See, for example Hamilton, CM., Gene 200: 107 (1997); Muller et al. Mol. Gen. Genet, 207: 
ss 171 (1987); Komari et al. Plant J. 20:165 (1996); Venkateswariu et al. Biotechnology 9: 11 03 (1991)and Gleave, AP., 

Plant Mol. Sib/. 20:1203 (1992); Graves and Goldman, Plant MoL Biol.T.^^ (1986) and Gould et al.. Plant Physiology 

95:426 (1991). 

[2305] Transfonned plant cells whk;h are derived by any of the above transformation technkiues can be cultured to 
regenerate a whole plant that possesses the transformed genotype and thus the desired phenotype such as seedless- 

40 ness. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, 
typk:ally relying on a bkx:ide and/or herbcide marker which has been introduced together with the desired nucleotide 
sequences. Plant regeneration from cultured protoplasts is described In Evans et al.. Protoplasts Isolation and Culture 
in Handbook of Plant Cell Culture." pp. 124176. (McMillan Publishing Company. New York, 1983; and Binding, Re- 
generation of Plants, Plant Protoplasts, pp. 2173, CRC Press, Boca Raton, 1988. Regeneration can also be obtained 

4S from plant callus, exptants, organs, or parts thereof. Such regeneration techniques are described generally in Klee et 
al. Ann. Rev. of Plant Phys. 38:467 (1987). Regeneratbn of monocots (rice) is described by Hosoyama et al. (Biosci. 
BiotechnoL B/ochem. 58: 1500 (1994)) and by Ghosh etal. {J. Biotechnol. (1994)). The nucleic ackis of the inven- 
tk>n can be used to confer desired traits on essentially any plant. 

[2306] Thus, the irwention has use over a broad range of plants, including species from the genera Anacardium, 
so Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citruilus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cu- 

curbita, Daucus, Elaeis, Fragaria, Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, 

Linum, LoSum,Lupinus, Lycopersioon, Malus, l^n^iot, Majorana, Medicago, Nicotiana, Otea, Oryza, Panieum, Pan- 

nesetum, Persea, Phaseolus, Pbtachia, Pisum, Pyrus, Prunus, Raphanus, Ridnus, Secale, Senedo, Sinapis, Sola- 

nam. Sorghum, Theobromus, Trigonella, Triticum, Vida, Vitb, Vigna, and, Zea. 
ss [2307] One of skill will recognize that after the expressbn cassette is stably incorporated in transgenic plants and 

confirmed to be operable, it can be introduced hto other plants by sexual crossing. Any of a number of standard 

breeding techniques can be used, depending upon the species to be crossed. 

[2308] The partk:utar sequences of SDFs identified are provkied in the attached REF AND SEQ TABLES 1 AND 2. 
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One of ordinary skill in the art. having this data, can obtain cloned DN A fragments, synthetic DN A fragments or polypep- 
tides constituting desired sequences by recombinant methodology known in the art or described herein. 

EXAfMPLES 

5 

[2309] The inventbn is illustrated by way of the following examples. The inventkxi is not limited by these examples 
as the scope of the invention is defined solely by the claims following. 

EXAMPLE 1: cDNA PREPARATION 

10 

[2310] A number of the nucleotide sequences discbsed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can be obtained by sequencing gerKxnic DNA (gDNA) and/6r cDNA from com plants 
grown from HYBRID SEED # 35A19, purchased from Pioneer Hi-Bred International, Inc., Supply Management, P.O. 
Box 256, Johnston, Iowa 501 31 -0256. 

IB [231 1 ] A number of the nucleotide sequences disclosed in REF AND SEQ TABLE S 1 AND 2 herein as representative 
of the SDFs of the Invention can also be obtained by sequencing genomic DNA from Ambidopsis thalkina, Wassilews- 
kija ecotype or by sequertcing cDNA obtained from mRNA from such plants as descn*bed bebw. This is a true breeding 
strain. Seeds of the plant are available from the Arabidopsis Biobgical Resource Center at the Ohio State University, 
under the accession number CS2360. Seeds of this plant were deposited under the temrts and conditions of the Bu- 

20 dapest Treaty at the Amerk:an Type Culture Collection. Manassas, VA on August 31 , 1 999, and were assigned ATCC 
No. PTA-595. 

[2312] Othermethocteforcloningfull-lengthcDNAaredescribed,forexample, by SekietaL, Plant Joumaf^S:707'72D 
(1998) High-efficierK:y cloning of Arabidopsis full-length cDNA by biotinylated Cap trapper'; Maruyama et al.. Gene 
138 : 1 71 (1 994) Oligo-capping a simple method to replace the cap structure of eukaryotic mRNAs with oiigoribonucle- 

25 otides"; and WO 96/34981 . 

[231 3] Tissues were, or each organ was, individually pulverized and frozen in llqukJ nitrogen. Next, the samples were 
homogenized in the presence of detergents and then centrrf uged. The debris and nuclei were renrtoved from the sample 
and more detergents were added to the sample. The sample was centrifuged and the debris was removed. Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RNA was isolated by treatment with detergents 

30 and proteinase K followed by ethanol precipitation and centrifugation. The polysomal RNA from the different tissues 
was pooled according to the folbwing mass ratbs: 15/15/1 for male inflorescences, female inflorescences and root, 
respectively. The pooled material was then used for cDNA synthesis by the methods described below. 
[2314] Starting material for cDNA synthesis for the exemplary corn cDNA clones with sequences presented in REF 
AND SEQ TABLES 1 AND 2 was poly(A)-containing polysonrial mRNAs from inflorescences and root tissues of com 

3S plants grown from HYBRID SEED # 35A1 9. Male inflorescences and female (pre-and post-fertilization) inflorescences 
were isolated at various stages of devebpment. Selectbn for poly(A) containing polysomal RNA was done using oligo 
d(T) cellulose columns, as described by Cox and Goldberg, Rant Molecular Bblogy: A Practical Approach", pp. 1-35, 
Shaw ed., c. 1988 by IRL, Oxford. The quality and the integrity of the polyA+ RNAs were evaluated. 
[231 5] Starting material for cDNA synthesis for the exemplary Ambidopsis cDN A cbnes with sequences presented 

40 in REF AND SEQ TABLES 1 AND 2 was polysonnal RNA isolated from the top-most inflorescence tissues of Arabidopsis 
thaliana W^ssilewskija (Ws.) and from roots of Arabidopsis thaliana Landsberg erecta (L. er.), also obtained from the 
Arabktopsis Biobgbal Resource Center. Nine parts lnflorescefK:e to every part root was used, as measured by wet 
mass. Tissue was pulverized and exposed to liquid nitrogen. Next, the sample was homogenized in the presence of 
detergents and then centrifuged. The debris and nuclei were removed from the sample and more detergents were 

45 added to the sample. The sample was centrifuged and the debris was removed and the sample was applied to a 2M 
sucrose cushion to isolate polysomal RNA. Cox et al., Plant Molecular Bblogy: A Practical Approach", pp. 1-35, Shaw 
ed., c. 1988 by IRL, Oxford. The polysomal RNA was used for cDNA synthesis by the methods described below 
Polysomal mRNA was then isolated as described above for com cDNA. The quality of the RNA was assessed elec- 
trophoretically. 

so [231 6] Folbwing preparation of the mRNAs from various tissues as described above, selection of mRNA with intact 
5' ends and specific attachment of an oligonucleotkie tag to the 5' end of such mRNA was performed using either a 
chembal or enzymatb approach. Both techniques take advantage of the presence of the cap' structure, whbh char- 
acterizes the 5' end of most intact mRNAs and whbh comprises a guanosine generally methylated once, at the 7 
positbn. 

ss [2317] The chemical modification approach involves the optional elimination of the 2', 3'-cis diol of the 3* terminal 
ribose, the oxidatbn of the 2*, 3'-cis dbl of the ribose linked to the cap of the 5' ends of the mRNAs into a dialdehyde, 
and the coupling of the such obtained dialdehyde to a derivatized oligonucleotide tag. Further detail regarding the 
chembal approaches for obtaining mRNAs having intact 5' ends are disck>sed in Intematbnal Application No. 
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W096/34981 published November 7,1996. 

[231 8] The enzymatic approach for ligating the oligonucleotide tag to the intact 5' ends of mRNAs Involves the removal 
of the phosphate groups present on the 5* ends of uncapped Incomplete mRNAs, the subsequent decapping of mRNAs 
having intact 5* ends and the ligation of the phosphate present at the 5' end of the decapped mRNA to an oligonucleotide 
s tag. Further detail regarding the enzymatic approaches for obtaining mRNAs having intact 5' ends are disclosed In 
Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI University. Le clonage des ADNc complets: difficultes et per- 
spectives nouvelles. Apports pour P6tude de la regulation de ^expression de la tryptophane hydroxylase de rat, 20 
Dec. 1993). EPO 625572 and Kato etaL, Gene 150:243-250 (1994). 

[2319] In both the chemical and the enzymatic approach, the oligonucleotide tag has a restriction enzyme site (e.g. 
10 an EcoRI site) therein to facilitate later cloning procedures. Following attachment of the oligonucleotide tag to the 
mRNA, the integrity of the mRNA is examined by performing a Northern blot using a probe complementary to the 
oligonucleotide tag. 

[2320] For the mRNAs joined to oligonucleotide tags using either the chemical or the enzymatic method, first strand 
cDNA synthesis is performed using an oligo<fT primer with reverse transcriptase. This oligo-dT primer can contain an 
IS Internal tag of at least 4 nucleotides, which can be different from one mRNA preparation to another. Methylated dCTP 
is used for cDNA first strand synthesis to protect the internal EcoRI sites from digestion during subsequent steps. The 
first strand cDNA is precipitated using isopropanol after removal of RNA by alkaline hydrolysis to eliminate residual 
primers. 

[2321] Second strand cDNA synthesis is conducted using a DNA polymerase, such as Klenow fragment and a primer 
20 corresponding to the 5' end of the ligated oligonucleotide. The primer is typically 20-25 bases in length. Methylated 
dCTP is used for secorKi strarKi synthesis In order to protect internal EcoRI sites in the cDNA from digestion during 
the cloning process. 

[2322] Following second strand synthesis, the fulHength cDNAs are cloned into a phagemid vector, such as pBlue- 
Script™ (Stratagene). The ends of the f ulWength cDNAs are blunted with T4 DNA polymerase (Biolabs) and the cDNA 
2S is digested with EcoRI. Since methylated dCTP is used during cDNA synthesis, the EcoRI site present in the tag is the 
only hemi-methylated site; hence the only site susceptible to EcoRI digestion. In some instances, to facilitate subclon- 
ing. an Hind III adapter is added to the 3* end of fulMength cDNAs. 

[2323] The full-length cDNAs are then size fractionated using either exclusion chromatography (AcA, Biosepra) or 
elect rophoretic separation which yields 3 to 6 different fractions. The fulMength cDNAs are then directionally cloned 

30 either into pBlueScript™ using either the EcoRI arid Smal restriction sites or, when the Hind III adapter is present in 
the full-length cDNAs, the EcoRI and Hind III restriction sites. The ligation mixture is transformed, preferably by elec- 
troporatlon, into bacteria, which are then propagated under appropriate antibbtic selection. 
[2324] Clones containing the oligonucleotide tag attached to full-length cDNAs are selected as follows. 
[2325] The plasmid cDNA libraries made as descrtoed atx>ve are purified (e.g. by a column available from QIagen). 

3S A positive selection of the tagged clones is performed as follows. Briefly, in this selection procedure, the plasmid DNA 
is converted to single stranded DNA using phage FI gene il endonuclease in combination with an exonuclease (Chang 
et al., Gene 127:95 (1993)) such as exonudease III or T7 gene 6 exonuclease. The resulting single stranded DNA is 
then purified using paramagnetic beads as described by Fry et al., Biotechniques 13: 124 (1992). Here the single 
stranded DNA is hybridized with a biotinylated oligonucleotide having a sequence corresponding to the 3' end of the 

40 oligonucleotide tag. Preferably, the printer has a length of 20-25 bases. Clones including a sequence complementary 
to the biotinylated oligonucleotide are selected by incubation with streptavidin coated magnetic beads followed by 
magnetic capture. After capture of the positive clones, the plasmid DNA is released from the magnetic beads and 
converted into double stranded DNA using a DNA polymerase such as ThermoSequenase^ (obtained from Amersham 
Pharmacia Biotech). Alternatively, protocols such as the Gene Trapper™ kit (Gibco BRL) can be used. The double 

45 stranded DNA is then transformed, preferably by electroporation, into bacteria. The percentage of positive ckxies 
having the 5' tag oligonucleotide is typk:ally estimated to be between 90 and 98% from dot blot analysis. 
[2326] Folk>wing transfonmation, the libraries are ordered in microliter plates and sequenced. The Ambkiopsis library 
was deposited at the American Type Culture Collection on January 7, 2000 as £-co//liba 01 0600" under the accession 
number PTA-1161. 

so 

EXAMPLE 2: SOUTHERN HYBRIDIZATIONS 

[2327] The SDFs of the invention can be used in Southern hybridizations as described above. The following describes 
extraction of DNA from nuclei of plant cells, digestion of the nuclear DNA and separation by length, transfer of the 
«5 separated fragments to membranes, preparation off probes for hybridization, hybridization and detection of the hybrid- 
ized probe. 

[2328] The procedures described herein can be used to isolate related polynucleotides or for diagnostic purposes. 
Moderate stringency hybridization conditions, as defined above, are described in the present example. These condi- 
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tbns result in detection of hybridization between sequences having at least 70% sequence identity. As described above, 
the hybridization &n6 wash conditions can be changed to reflect the desired percentage of sequence identity between 
probe and target sequences that can be detected. 

[2329] In the following procedure, a probe for hybridization is produced from two PCR reactions using two primers 
5 from genomic sequence of Ambidopsis thaSana. As described above, the particular template for generating the probe 
can be any desired template. 

[2330] The first PCR product is assessed to validate the size of the primer to assure it is of the expected size. Then 
the product of the first PCR is used as a template, with the same pair of primers used in the first PCR, in a second 
PCR that produces a labeled product used as the probe. 
10 [2331] Fragments detected by hybridization, or other bands of interest, can be isolated from gels used to separate 
genomic DNA fragments by known methods for further purification and/or characterization. 

Buffers for nuclear DNA extraetion 

IS [2332] 



1.10XHB 





1000 ml 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spermidine (Sigma S-2501) 


10 mM spermine 


3.5 g 


Stabilize chromatin and the nuclear membrane 


0.1 M EDTA (disodium) 


37.2 g 


EDTA inhibits nuclease 


0.1 MTris 


12.1 g 


Buffer 


0.8 M KCI 


59.6 g 


Adjusts ionic strength for stability of nuclei 


Adjust pH to 9.5 with ION NaOH. It appears that there is a nuclease present in leaves. Use of pH 9.5 appears 
to inactivate this nuclease. 



30 

2. 2 M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to about 50° C. Add the sucrose slowly then bring the mixture to close 
to final volume; stir constantly until It has dissolved. Bring the solution to volume. 

^ 3. Saricosyl solution (tyses nuclear membranes) 





1000 ml 


N-lauroyI sarcoslne (Saricosyl) 
0.1 MTris 

0.04 M EDTA (Disodium) 


20.0 g 

12.1 g 
14.9 g 


Adjust the pH to 9.5 after all the components are dissolved and bring up to the proper volume. 



4.20%TritonX-100 

45 

80 ml Triton X-1 00 

320 ml IxHB (w/o p^ME and PMSF) 

Prepare in advance; Triton takes some time to dissolve 

50 A. Procedure 

[2333] 

1. Prepare IX H* buffer (keep ice-coM during use) 





1000 ml 


10XHB 


100 ml 
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(continued) 





1000 ml 


2 M sucrose 
W^ter 


250 ml a non-ionic osmoticum 
634mt 


Added Just before use: 




100 mM PMSF* 
p-mercaptoethanol 


10 ml a protease inhibrtor; protects nuclear membrane prc^elns 
1 ml inactivates nuclease by reducing disulfide bonds 



lOOmMPMSF 

(phenyl methyl sulfonyl fluoride, Sigma P-7626) 
(add 0.0875 g to S ml100% elhanoO 



2. Homogenize the tissue in a blender (use 300-400 ml of IxHB per blender). Be sure \haX you use 5-10 ml of HB 
buffer per gram of tissue. Blenders generate heat so be sure to keep the homogenate cold. It Is necessary to put 
the blerwters in ice periodically. 

3. Add the 20% Triton X-1(X) (25 ml per liter of homogenate) and gently stir on ice for 20 min. This lyses plastid, 
but not nuclear, membranes. 

20 

4. Filter the tissue suspension through several nylon filters into an ice-cold beaker. The first filtration is through a 
250-micron membrane; the second is through an 85-micron membrane; the third is through a 50-mk:ron membrane; 
and the fourth is through a 20-mtcron membrane. Use a large funnel to hold the filters. Filtratbn can be sped up 
by gently squeezing the liqukt through the filters. 

25 

5. Centrifuge the filtrate at ^2D0 x g for 20 min. at 4**C to pellet the nuclei. 

6. Discard the dark green supernatant. The pellet will have several layers to it. One is starch; it is white and gritty 
The nuclei are gray and soft. In the early steps, there may be a dark green and somewhat viscous layer of chlo- 

^ roplasts. 

Wash the pellets In about 25 ml ooM H buffer (with Triton X-1 00) and resuspend by swirling gently and pipetting. 
After the pellets are resusperKled. 

^ Pellet the nuclei again at 1 200 - 1 300 x g. Discard the supernatant. 

Repeat the wash 3-4 times until the supernatant has changed from a dark green to a pale green. This usually 
happens after 3 or 4 resuspensions. At this point, the pellet is typically grayish white and very slippery. The 
Triton X-1 00 in these repeated steps helps to destroy the chloroplasts and mitochondria that contaminate the 
^ prep. Resuspend the nuclei for a final time in a total of 15 ml of H buffer and transfer the suspension to a sterile 

125 ml Erienmeyer flask. 



45 



7. Add 15 ml, dropwise, coki 2% Sarkoeyl, 0.1 M Tris, 0.04 M EDTA solutkxi (pH 9.5) while swirling gently. This 
lyses the nuclei. The solution will become very viscous. 

8. Add 30 grams of CsCl and gently swirl at room temperature until the CsCI is in solution. The mixture will be gray, 
white and viscous. 



9. Centrifuge thesolution at 11 ,400x gat 4**C for at least 30 min. The tongerthis spin is, the firmer the protein pellicle. 

10. The result is typkrally a clear green supernatant over a white pellet, and (perfiaps) under a protein pellicle. 
Carefully renrwve the solution urtder the protein peltele and above the pellet. Determine the density of the solutkxi 
by weighing 1 ml of solution and add CsCI It necessary to bring to 1 .57 gAnl. The solution contains dissoh^ed solids 
(sucrose etc) and the refractive index alone will not be an accurate gukie to CsCI concentration. 

11. Add 20 |jU of 10 n\gfm\ EtBr per ml of solutkxi. 



333 



EP1 033 405A2 



12. Centrifuge at 184,000 x g for 16 to 20 hours in a fixed-angle rotor. 

13. Remove the dark red supernatant that is at the top of the tube with a plastic transfer pipette and discard. 
Carefully remote the DNA band with another transfer pipette. The DNA band is usually visible in room light; oth- 

5 erwise. use a long wave UV light to locate the band. 

14. Extract the ethidium bromide with isopropanol saturated with water and salt. Once the sotutbn is clear, extract 
at least two more times to ensure that all of the EtBr is gone. Be very gentle, as it is very easy to shear the DNA 
at this step. This extraction may take a while because the DNA sotutbn tends to be very viscous. It the solutkxi is 

10 too viscous, dilute It with TE. 

15. DIalyze the DNA for at least two days against several changes (at least three times) of TE (10 mM Tris, ImM 
EDTA, pH 8) to remove the cesium chtoride. 

IS 1 6. Remove the dialyzed DNA from the tubing. If the diatyzed DNA solution contains a lot of debris, centrifuge the 

DNA solution at least at 2500 x g for 10 min. and carefully transfer the clear supernatant to a new tube. Read the 
A260 concentration of the DNA. 

17. Assess the quality ot the DNA by agarose gel electrophoresis (1% agarose gel) of the DNA. Load 50 ng and 
20 100 ng (based on the OD reading) and compare it with known and good quality DNA. Undigested lambda DNA 

and a lambda-Hindlll-digested DNA are good molecular weight makers. 

Protocol for Digestion of GerKMnIc DNA 

2S Protocol: 
[2334] 

1 . The relative amounts of DNA for different crop plants that provide approximately a balanced number of genome 
30 equivalent is given in Table 3. Note that due to the size of the wheat genome, wheat DNA will be underrepresented. 

Lambda DNA provides a useful control for complete digestbn. 

2. Precipitate the DNA by adding 3 volumes of 100% ethanol. Incubate at -20°C for at least two hours. Yeast DNA 
can be purchased and made up at the necessary concentration, therefore no precipitatk)n is necessary for yeast 

35 DNA. 

3. Centrifuge the sotutk>n at 11 ,400 x g for 20 min. Decant the ethanol carefully (be careful not to disturb the pellet). 
Be sure that the residual ethanol is completely removed either by vacuum desiccation or by carefully wiping the 
sides of the tubes with a clean tissue. 

40 

4. Resuspend the pellet in an appropriate volume of water. Be sure the pellet is fully resuspended before proceeding 
to the next step. This may take about 30 min. 

5. Add the appropriate volume of 10X reaction buffer provided by the manufacturer ot the restriction enzyme to 
45 the resuspended DNA folbwed by the appropriate volume of enzymes. Be sure to mix it property by stowly swiriing 

the tubes. 

6. Set-up the lambda digestk)n-control for each DNA that you are digesting. 

so 7. Incubate both the experimental and lambda digests overnight at 37^C. Spin down condensatbn in a mbrof uge 

before proceeding. 

8. After digestkxi. add 2 ^1 of ksading dye (typically 0.25% bromophenol blue, 0.25% xylene cyanol in 15% Ficoll 
or 30% glycerol) to the lambda-control digests and k)ad in 1 % TPE-agarose gel (TPE is 90 mM Tris-phosphate, 2 

ss mM EDTA. pH 8). If the lambda DNA in the lambda control digests are completely digested, proceed with the 

precipitatkxi of the genomic DNA in the digests. 

9. Precipitate the digested DNA by adding 3 volurrtes of 1 00% ethanol and incubating in -20^C for at least 2 hours 
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(preferably overnight). 

EXCEPTION: Amtndofisis and yeast DN A are digested in an appropriate volume; they doni have to be precipitated. 

10. ResuspendtheDNA in an appropriate volume ot IE (e.g.» 22 ^Ix 50 blots = 11 00^1) and an appropriate volume 
5 of 10X loading dye (e.g., 2.4 fxi x 50 blots = 1 20 Be careful In pipetting the loading dye - it is viscous. Be sure 

you are pipetting the correct volume. 



Tabled 



10 



Some guide points in digesting genomic DNA. 


Species 


Genome Size 


Size Relative to 
Arabidopsis 


Genome Equivalent to 
2 |ig Arabidopsis ONA 


Amount of DNA per 
blot 


Arabidopsis 


120 Mb 


IX 


IX 


2Mg 


Brassica 


1.100 Mb 


9.2X 


0.54X 




Com 


2.800 1Mb 


23.3X 


0.43X 


20 Jig 


Cotton 


2,300 Mb 


19.2X 


0.52X 


20 Jig 


Oat 


11,300 Mb 


94X 


0.11X 


20 ^g 


Rice 


400 Mb 


3.3X 


0.75X 


5^9 


Soybean 


1,100 Mb 


9.2X 


0.54X 


10 Jig 


Sugarbeet 


758 Mb 


6.3X 


0.8X 


10 fig 


Sweetclover 


1.100 Mb 


9.2X 


0.54X 


lOng 


Wheat 


16.000 Mb 


133X 


0.08X 


20 Jig 


Yeast 


15 Mb 


0.12X 


IX 


0.25 fig 



Protocol for Southern Biot Analysis 

[2335] The digested DNA samples are electrophoresed in 1% agarose gels in be TPE buffer. Low voltage; overnight 
separations are preferred. The gels are stained with EtBr and photographed. 



1 . For blotting the gels, first incubate the gel in 0.25 N IHCI (with gentle shaking) for about 15 min. 

2. Then briefly rinse with water. The DNA is denatured by 2 incubations. Incubate (with shaking) in 0.5 M NaOH 
in1.5MNaClfor15min. 

3. The gel is then briefly rinsed in water and neutralized by incubating twice (with shaking) in 1.5 M Tris pH 7.5 in 
1.5MNaClfor15min. 

4. A nylon membrane is prepared by soaking it in water for at least 5 min, then in 6X SSC for at least 15 min. 
before use. (20x SSC is 175.3 g NaCl, 88.2 g sodium citrate per liter, adjusted to pH 7.0.) 

5. The nykxi membrane is placed on top of the get and all bubbles in between are removed. The DNA is blotted 
from the gel to the membrane using an absoft>ent medium, such as paper toweling and 6x SCC buffer. After the 
transfer, the membrane may be lightly brushed with a gk>ved hand to remove any agarose staking to the surface. 

6. The DNA is then fixed to the membrane by UV crosslinking and t>aking at BO^C. The membrane is stored at 4''C 
until use. 
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B. Protocol for PGR Ampltficatton of Genomic Fragments in Arabidopsis 
Amplification procedures: 
5 [2336] 



1 . Mix the following in a 0.20 ml PGR tube or 96-well PGR plate: 



Volume 


Stock_ 


Final Amount or Cone. 


0.5^1 


-10 ng/^l genomic DNA^ 


5ng 


2.5 


10X PGR buffer 


20 mM Tris. 50 mM KGI 


0.75^11 


50 mhA MgGIg 


1.5 mM 


Ijil 


10 pmol/^ Primer 1 (Forward) 


lOpmol 


1jil 


10 pmol/^l Primer 2 (Reverse) 


lOpmot 


0.5 


5 mM dNTPs 


0.1 mM 


0.1 ^1 


5 units/^l Platinum Taq' (Life Technologies^ Gaithersburg, MD) DNA 
Polymerase 


1 units 


(to 25 


Water 





2. The template DNA is amplified using a Perkin Elmer 9700 PGR machine: 

2S 



1) 94<*G for 10 min. folbwed by 


2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


94 •G - 30 sec 
62*G-30sec 
72 ^G - 3 min 


94 '^G - 30 sec 
58 ""G- 30 sec 

72 '^C-Smln 


94 »G - 30 sec 
53 ''G-30sec 
72 "C- 3 min 


5) 72'^G for 7 min. Then the reactions are stopped by chilling to 4'*G. 



[2337] The procedure can be adapted to a multi-well format if necessary. 
Quantifleation and Dilution of PGR Products: 

40 

[2338] 

1 . The product of the PGR is analyzed by electrophoresis in a 1% agarose gel. A linearized plasmid DNA can be 
used as a quantification standard (usually at 50, 100. 200, and 400 ng). These will be used as references to 
^ approximate the amount of PGR products. Hindi ll-digested Lambda DNA is useful as a molecular weight marker. 

The gel can be run fairly quickly; e.g., at 100 volts. The standard gel is examined to determine that the size of the 
PGR products is consistent with the expected size and if there are signifk:ant extra bands or smeary products in 
the PGR reactions. 

^ 2. The amounts of PGR products can be estimated on the basis of the plasmki standard. 

3. For the small number of reactions that produce extraneous bands, a small amount of DNA from bands with the 
correct size can be isolated by dipping a sterile lO^pt tip into the band while viewing though a UV Transilluminator. 
The small amount of agarose gel (with the DNA fragment) is used in the labeling reaction. 

55 
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C. Protocol for PCR-DIG-Labeling of DNA 

Solutions: 
5 [2330] 

Reagents in PCR reactions (diluted PGR products. 1 0X PGR Buffer. 50 mM MgCl2. 5 U/\l\ Platinum Taq Polymerase, 
and the primers) 

10 10X dNTP + DIG-11^UTP [1:5]: (2 mM dATP, 2 mM dCTR 2 mM dGTP, 1.65 mM dTTP. 0.35 mM DIG-11-dUTP) 

10X dNTP + DIG-11 -dUTP [1 ilOJ: (2 mM dATP, 2 mM dCTP. 2 mM dGTP. 1 .81 mM dTTP. 0 1 9 mM DIG-11 -dUTP) 
1 0X dNTP + DIG-1 1 -dUTP [1:15]: (2 mM dATP, 2 mM dCTP, 2 mM dGTP, 1 .875 mM dTTP. 0. 1 25 mM DIG-1 1 -dUTP) 

15 

TE buffer (10 mM Tris, 1 mM EDTA. pH 8) 

Maleate buffer: In 700 ml of deionized distilled water, dissolve 11 .61 g maleic acid and 8.77 g NaCI. Add NaOH to 
adjust the pH to 7.5. Bring the volume to 1 L. Stir for 15 min. and sterilize. 

20 

10% blocking solution: In 80 ml deionized distilled water, dissolve 1.1 6g maleic acid. Next, add NaOH to adjust 
the pH to 7.5. Add 1 0 g of the blocking reagent powder (Boehringer Mannheim. Indianapolis, I N, Cat. no. 1 0961 76). 
Heat to 60^C while stirring to dissolve the powder. Adjust the volume to 100 ml with water. Stir and sterilize. 

^ 1% bkx:king solution: Dilute the 1 0% stock to 1% using the maleate buffer. 

Buffer 3 (100 mM Tris, 100 mM NaCI, 50 mM MgCl2. pH9.5). Prepared from autoclaved solutions of 1M Tris pH 
9.5, 5 M NaCI. and 1 M MgClg in autoclaved distilled water. 

30 Procedure: 

[23401 

1. PCR reactions are performed in 25 ^1 volumes containing: 



PCR buffer 


IX 


MgCl2 


1.5 mM 


10XdNTP + DIG-1 1-dUTP 


IX (please see the note below) 


Platinum Taq^ Polymerase 


1 unit 


10 pg probe DNA 




10 pmol primer 1 





Note: 

45 





Use for: 


10X dNTP + DIG-1 l-dUTP (1:5) 


<1 kb 


10X dNTP + DIG-11-dUTP (1:10) 
10X dNTP + DIG-ll-dUTP (1:15) 


1 kbto1.8kb 
>1.8kb 



2. The PCR reactkxi uses the foltowing amplification cycles: 



ss 
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10 



IS 



20 



2S 



30 



40 



1)94''Cfor10min. 


2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cydes: 


DS^C - 30 sec 
ere - 1 min 
yS^C - 5 min 


95"C - 30 sec 
59"C - 1 min 
TS^C-Smin 


95*C - 30 sec 
5rc-1 min 
73*C - 5 min 


5) 72'C for 8 min. The reactions are terminated by chilling to (hold). 



3. The products are analyzed by electrophoresis- in a 1% agarose gel. comparing to an aliquot of the unlabelled 
probe starting material. 

4. The amount of DIG-labeled probe is determined as follows: 

Make serial dilutions of the diluted control DNA in dilution buffer (TE: 10 mM Tris arKi 1 mM EDTA, pH 8) as 
shown in the following table: 



DIG-labeled control DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution Name) 


Sng/^l 


1 ^il in 49 ^il TE 


100pg/jii(A) 


100pg/^il(A) 


25 ^il in 25 filTE 


50 pg/^l (B) 


50pg/^l (B) 


25 ^1 in 25 ^1 TE 


25 pg/pJ (C) 


25pg/nl(G) 


20^lin30^ITE 


lOpg^liJ(D) 



a. Serial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg are spotted onto a positively 
charged nylon membrane, marlcing the membrane tightly with a pencil to identify each dilution. 

b. Serial dilutions (e.g., 1:50, 1:2500, 1:10,000) of the newly labeled DNA probe are spotted. 

c. The membrane is fixed by UV crosslinking. 

d. The membrane is wetted with a small amount of maleate buffer and then incubated in 1% blocking solutbn 
for 1 5 min at room temp. 

e. The labeled DNA is then detected using alkaline phosphatase conjugated anti-DIG antibody (Boehringer 
Mannheim. Indiariapolis. IN. cat. x)o. 1093274) and an NBHT substrate according to the manufacture's instoic- 
tk3n. 

f . Spot intensities of the control and experimental dilutk)ns are then compared to estimate the concentratbn 
of the PCH-DIG-iabeled probe. 

D. Prehybridization and Hybridization of Southern Blots 

Solutbns: 
[2341] 



SO 



SS 



100% Fonmamlde 


purchased from Gibco 


20X SSC 


(IX = 0.15 M NaCl. 0.015 M Na;^itrate) 


per L: 


175 g r^Gl 




87.5 g IMa^ltrate*2H20 



20% Sarkosyl (N-lauroyl-sarcosine) 
20% SDS (sodium dodecyl sulphate) 
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10% Bloctcing Reagent: In 80 ml deionized distilled water, dissolve 1 .16 g maleic acid. Next, add NaOH to adjust 
the pH to 7.5. Add 10 g of the blocking reagent powder. Heat to eO'C while stirring to dissolve the powder. Adjust 
the volume to 100 ml with water. Stir and sterilize. 



10 



IS 



Prehybrldization Mix: 


Final Concentration 


Components 


Vblume (per 100 ml) 


Stock 


50% 


Formamide 


50 ml 


100% 


5X 


ssc 


25 ml 


20X 


0.1% 


Sarkosyl 


0.5 ml 


20% 


0.02% 


SDS 


0.1 ml 


20% 


2% 


BkxHcing Reagent 


20 ml 


10% 




Water 


4.4 ml 





General Procedures: 
^ [2342] 

1 . Place the bk>t in a heat-sealable plastk: bag and add an appropriate volume of prehybrldization solution (30 ml/ 
lOOcm^ at room temperature. Seal the bag with a heat sealer, avoiding bubbles as much as possible. Lay down 
the bags in a large plastic tray (one tray can accommodate at least 4-5 bags). Ensure that the bags are tying flat 
in the tray so that the prehybrkiizatbn solution is evenly distributed throughout the bag. Incubate the blot for at 
least 2 hours with gentle agitatkm using a waver shaker. 

2. Denature DIG-labeled DNA probe by incubating for 10 min. at 9&^C using the PGR machine and immediately 
cool it to 4*0. 

30 

3. Add probe to prehybrldization solution (25 ng/ml; 30 ml = 750 ng total probe) and mix well but avoki foaming. 
Bubbles may lead to backgrourtd. 

^ 4. Pour off the prehybridizatbn solutbn from the hybrkiization bags and add new prehybrkiization and probe so- 
lution mixture to the bags containing the membrane. 

5. Incubate with gentle agitatbn for at least 16 hours. 

6. Proceed to medium stringency post-hybndizatkxi wash: 

40 

Three times for 20 min. each with gentle agitatk)n using 1 X SSC, 1% SDS at 60"C. 

All wash sotutk)ns must be prewarmed to 60^C. Use about 100 ml of wash solution per membrane. 

45 

To avokl background keep the membranes fully submerged to avoid drying in spots; agitate sufficiently to 
avoki having membranes stck to one another. 

7. After the wash, proceed to immunok>gk:al detect kxi and CSPD development. 

so 

E. Procedure tor Immunological Detection with CSPD 

Solut tons: 



Buffer 1 : Maleic acki buffer (0.1 M maleto acki, 0.15 M NaCl; adjusted to pH 7.5 with NaoH) 
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Washing buffer: 



Matetc acid buffer with 0.3% (v/v) Tween 20. 



Blocking stock solutk)n 



10% blocking reagent in buffer 1 . Dissolve (10X concentration): bkx:king reagent pow- 
der (Boehringer l^4annheim, Indianapolis, IN, cat. no. 1096176) by constantly stirring 
on a 65^*0 heating block or heat In a mk:rowave, autoclave and store at 4^*0. 



Buffer 2 

(1X blocking sofutkx)): 



Dilute the stock sotutk)n 1 :10 in Buffer 1. 



Detectbn buffer 



0.1 M Tris, 0.1 M NaCI, pH 9.5 



Procedure: 



[2344] 



1 . After the post-hybrkJizatbn wash the btots are briefly rinsed (1 -5 min.) in the maleate washing buffer with gentle 
shaking. 

2. Then the membranes are incubated for 30 min. in Buffer 2 with gentle shaking. 

3. Anti-DIG-AP conjugate (Boehringer Mannheim. Indianapolis, IN, cat. no. 1093274) at 75 mU/ml (1:10,000) in 
Buffer 2 is used for detectk)n. 75 ml of solutkm can be used for 3 bk>ts. 

4. The membrane is incubated for 30 min. in the antibody solution with gentle shaking. 

5. The membrane are washed twk;e in washing buffer with gentle shaking. About 250 mis is used per wash for 3 
blots. 

6. The blots are equilibrated for 2-5 min in 60 ml detection buffer. 

7. Dilute CSPD (1:200) in detectkxi buffer. (This can be prepared ahead of time and stored in the dark at 4'*C). 
The following steps must be done individually. Bags (one for detection and one for exposure) are generally cut 
and ready before doing the folk>wing steps. 

8. The bbt is carefully removed from the detection buffer and excess liquid removed without drying the membrane. 
The bbt is immediately placed in a bag and 1 .5 ml of CSPD solutkxi is added. The CSPD solution can be spread 
over the membrane. Bubbles present at the edge and on the surface of the blot are typically removed by gentle 
rubbing. The membrane is incubated for 5 min. in CSPD solutbn. 

9. Excess lK|ukJ is removed and the membrane is bbtted briefly (DNA side up) on Whatman 3MM paper. Do not 
let the membrane dry completely. 

10. Seal the damp membrane in a hybndization bag and incubate for 10 min at 37**C to enhance the luminescent 
reactbn. 

1 1 . Expose for 2 hours at room temperature to X-ray film. Multiple exposures can be taken. Luminescence continues 
for at least 24 hours and signal intensity Increases during the first hours. 

Example 3: Transformation d Carrot Cells 

[2345] Transformation of plant cells can be accomplished by a number of methods, as described above. Similarly 
a number of plant genera can be regenerated from tissue culture folk>wing transformatbn. Transformatbn and regen- 
eratkxi of carrot cells as described herein is illustrative. 

[2346] Single cell suspensbn cultures of carrot (Daucus carota) cells are established from hypocotyls of cultivar 
Early Nantes in Bg growth medium (O.L. Gamborg et al., Plant Physiol 45:372 (1970)) plus 2,4-D and 15 mM CaClg 
(B5 -44 medium) by methods known in the art. The suspensk>n cultures are subcultured by adding 10 ml of the sus- 
pension culture to 40 ml of B^^ medium in 250 ml flasks every 7 days and are maintained in a shaker at 150 rpm at 
27 •C In the dark. 
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[2347] The suspension culture cells are transformed with exogenous DNA as described by Z. Chen et al. Plant Mof. 
Bio. 36:163 (1998). Briefly, 4^ys post-subcutture cells are incubated with cell wall digestion solution containing 0.4 
M sorbitol, 2% driselase, 5mM MES (2-[N-Morpholino] ethanesulfonic acid) pH 5.0 for 5 hours. The digested cells are 
pelleted gently at 60 xg for 5 mln. and washed twice in W5 solution containing 1 54 mM NaCI, 5 mM KCI. 125 mM CaC^ 

5 and 5mM glucose, pH 6.0. The protoplasts are suspended in MC solution containing 5 mM MES. 20 mM CaC^ 0.5 
M mannitol, pH 5.7 and the protoplast density is adjusted to about 4x10^ protoplasts per ml. 
[2348] 15-60 ^g of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting suspension is mixed with 40% 
polyethylene glycol (MW 8000, PEG 8000), by gentle Inversion a few times at room temperature tor 5 to 25 min. 
Protoplast culture medium known in the art is added into the PEG-DNA-protoplast mixture. Protoplasts are incubated 

10 in the culture medium for 24 hour to 5 days and cell extracts can be used tor assay of transient expression of the 
introduced gene. Alternativety, transformed cells can be used to produce transgenic callus, which in turn can be used 
to produce transgenic plants, by methods known in the art. See, for example, Nomura and Komamine, Pit Phys. 79: 
988-991 (1985), Identification and Isolation of Single Cells that Produce Somatic Embryos in Carrot Suspension Cul- 
tures. 

IS [2349] The inventbn being thus described, It will be apparent to one of ordinary skill in the art that various modifica- 
tions of the materials and methods for practicing the inventkyi can be nnade. Such nrxxJifk^ations are to be considered 
within the scope of the invention as defined by the folk>wing claims. 

[2350] Each of the references from the patent and perkxiksl literature cited herein Is hereby expressly incorporated 
in its entirety by such citatnn. 

20 

Claims 

1. An isolated nucleic acid molecule comprising a nuciek: acid having a nucleotkJe sequence whk:h encodes an amino 
25 acid sequence exhibiting at least 40% sequence kientity to an amino acid sequence encoded by 

(a) a nucleotkie sequence described in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a comptement of a nucleotide sequence shown in REF and^ SEQ Table 1 or 2 or a fragment thereof. 

30 2. An isolated nucleic acid molecule comprising a nuciek: acid having a nucleotkle sequence whk:h exhibits at least 
65% sequence identity to 

(a) a nucleotkie sequence shown In REF and/or SEQ Table 1 or 2 or a fragment thereof; or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

3S 

3. An Isolated nucleic acid molecule comprising a nuciek; acid having a nucleotkie sequence whk:h exhibits at least 
65% sequence identity to a gene comprising 

(a) a nucleotide sequerK^e shown in REF and/or SEQ Table 1 or 2 or a fragment thereof; or 
40 (b) a complement of a nucleotkie sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

4. An Isolated nuciek: ackJ molecule which is the reverse of the Isolated nucleotide sequence according to any one 
of clainns 1 -3. such that the reverse nucleotide sequerrce has a sequence order whbh is the reverse of the sequence 
order of sakJ isolated nucleotkie sequence according to any one of claims 1-3. 

45 

5. An isolated nuciek: ackl molecule comprising a nuciek: acki capable of hybridizing to a nucleic acid having a 
sequence selected from the group consisting of: 

(a) a nucleotkie sequence whk:h is shown in REF and/or SEQ Table 1 or 2; and 
so (b) a nucleotkie sequence whk:h is complementary to a nucleotide sequence shown in REF and/or SEQ Table 

1 or 2; 

under conditkxis that permit formatkxi of a nuciek: ackl duplex at a temperature from about 40^C and 48'*C below 
the melting temperature of the nuciek: acki duplex. 

55 

6. The nuciek: ackl molecule according to any one of clairm 1 -5. wherein saki nucleic acid comprises an open reading 
frame. 
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7. The isolated nucleic acid molecule of any one of claims 1 -5, wherein said nucleic acid is capable of functioning as 
a promoter, a 7 end termination sequence, an untranslated region (UTR), or as a regulatory sequence. 

8. The isolated nucleic acid molecule of claim 7. wherein said nucleic acid is a promoter and comprises a sequence 
s selected from the group consisting of a TATA box sequence, a CAAT box sequence, a motif of GCAATCG or any 

transcriptoin-factor binding sequence, and any combination thereof. 

9. The isolated nucleic acid molecule of claim 7, wherein the nucleic acid sequence is a regulatory sequence which 
is capable of pronrK>ting seed-specific expression, embryo-specific expression, ovule-specific expression, tapetum- 

10 specific expression or root-specific expression of a sequence or any combination thereof. 

10. A vector construct comprising a nucleic acid nrralecule according to any one of claims 1-9, wherein said nucleic 
acid molecule is heterologous to any element in said vector construct. 

IS 11. A vector construct according to claim 10 comprising: 

(a) a first nucleic acid having a regulatory sequence capable of causing transcription andAor translation; and 

(b) a second nucleic acid having the sequence of said isolated nucleic acid nrK>lecule according to any one of 
claims 1 -4; 

20 

wherein said first and second nucleic acids are operably linked and wherein said second nucleic acid is heterolo- 
gous to any element in said vector construct. 

12. The vector construct according to claim 11 , wherein said first nucleic acid is native to said second nucleic acid. 

13. The vector construct according to claim 11 , wherein said first nucleic acid is heterologous to said second nucleic 
acid. 

14. A vector construct according to claim 10 comprising: 

(c) a first nucleic acid having having the sequence of said isolated nucleic acid molecule according to claim 
7; and 

(d) a second nucleic acid; 

3S wherein said first and second nucleic acids are operably linked and wherein said first nuciek: acid Is heterok)gous 

to any element in saki vector construct. 

15. The vector construct according to claim 14, wherein saki first nuclek: ackl is native to saki second nucleic acki. 

40 16. The vector construct according to claim 14, wherein sakI first nucleic acki is heterok>gous to sakI second nucleic 
acid. 

17. A host cell comprising an isolated nuclek: add molecule according to any one of claims 1-4, wherein said nuclek; 
acid rTK>lecule is flanked by exogenous sequence. 

45 

18. A host cell comprising a vector construct of any one of claims 10-16. 

19. An isolated polypeptkle connprising an amirK> acd sequence 

so (a) exhibiting at least 40% sequence kientity of an amino acid sequence encoded by a sequence shown in 

REF and/or SEQ Table 1 or 2 or a fragment thereof; and 

(b) capable of exhibiting at least one of the bk>logk:al activities of the polypeptkle encoded by saki nucleotkle 
seqence shown in REF arui/or SEQ Table 1 or 2 or a fragment thereof. 

ss 20. The isolated polypeptkle of claim 19, wherein said amino acid sequence exhibits at least 75% sequence identity 
to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

21. The isolated polypeptkie of claim 19. wherein saki amino acid sequence exhibits at least 85% sequence identity 



25 
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to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

22. The isolated polypeptide of claim 1 9, wherein said amino acid sequence exhibits at least 90% sequence identity 
to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

5 

23. An antibody capable of binding the isolated polypeptide of any one of claims 19-22. 

24. A method of Introducing an Isolated nucleic acid into a host cell comprising: 

10 (a) providing an isolated nucleic acid molecule according to any one of claims 1 -4; and 

(b) contacting said isolated nucleic with said host cell under conditions that permit Insertion of said nucleic 
acid Into said host cell. 

25. A method of trartsforming a host cell which comprises contacting a host cell with a vector construct according to 
IS any one of claims 10-16. 

26. A method of modulating transcription and/or translation of a nucleic acid in a host cell comprising: 

(a) providing the host cell of claim 24 or 25; and 
20 (b) culturing said host cell under corulitions that penmit transcription or translation. 

27. A method for detecting a nucleic acid in a sample which comprises: 

(a) providing an isolated nucleic acid molecule according to any one of claims 1 -5; 
2S (b) contacting said isolated nucleic acid rrK>lecule with a sample under conditions which permit a comparison 

of the sequence of said isolated nucleic acid molecule with the sequence of DNA in said sample; and 

(c) analyzing the result of said comparison. 

28. The method according to claim 27, wherein said isolated nucleic acid molecule and said sample are contacted 
30 under conditions which permit the formation of a duplex between complementary nucleic acid sequences. 

29. A plant or cell of a plant which comprises a nucleic acid molecule according to any one of claims 1-4 which Is 
exogenous to said plant or plant cell. 

3S 30. A plant or celt of a plant which comprises a nucleic acid molecule according to any one of claims 1 -4, wherein said 
nucleic acid molecule is heterologous to said plant or said cell of a plant. 

31 . A plant or cell of a plant which has been transformed with a nucleic acid molecule according to any one of claims 1 -4. 

40 32. A plant of cell of a plant which comprises a vector construct according to any one of claims 10-16. 

33. A plant of cell of a plant which has been trar^ormed with a vector construct according to any one of claims 1 0-1 6. 

34. A plant which has been regenerated from a plant ceil according to any one of claims 29-33. 

45 
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